7.17 kmeans pipelines
20220316
As with all mlhub commands, a goal is to provide powerful combinations of commands through pipelines. We have seen this through the chapter already where we processed a csv file through a number of steps to normalise the columns, to then pipe the csv data into the train command followed by the predict command to output a csv file with each observation labelled with a cluster number. Below is collected sample pipelines that illustrate different data flows.
The output will be similar to the following:
sepal_length,sepal_width,petal_length,petal_width,label
5.0,3.6,1.4,0.2,1
7.7,3.8,6.7,2.2,0
6.1,3.0,4.9,1.8,2
5.4,3.7,1.5,0.2,2
...
To visualise the final clustering, to popup a display of the clustering result:
We can include within the pipeline a normalise operation:
cat wine.csv |
ml normalise kmeans |
tee norm.csv |
ml train kmeans 4 |
ml predict kmeans norm.csv |
mlr --csv cut -f label |
paste -d"," wine.csv -
After normalising the input dataset the result is saved to a file
norm.csv
using tee whilst piping the same data on
to the next command to train a clustering. We
save to file since we’d like to predict the
clusters for each of the normalised observations, then map them back
to the original observations. This is accomplished using a combination
of mlr to cut the label column from the csv
output from the predict command, and then we
paste that label column to the original wine.csv
.
Once we have the resulting model and the predictions made on the original data, we can visualise the result as part of a pipeline, whilst also using tee to save the clustering to file:
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0