Procedure
-
Double-click tMahoutClustering to open
its Component view.
-
From the Schema list, select Built-In and then click the [...] button next to Edit
Schema and describe the data structure in the input
file.
-
Add eight rows to the schema dialog box and define the input data as shown
in the above capture.
The component has one read-only column, clusterID.
- Click OK.
-
In the File Configuration area:
-
Click the [...] button next to the Input HDFS file and browse to the HDFS file on the Hadoop system that holds the input numerical data you want to cluster.
-
Set the field separator used to separate the columns in the clustered data.
-
In the Cluster columns table, add rows to the table and click in each row to select a column from the input schema.
-
-
In the Clustering Configuration
area:
-
From the Clustering Type list, select what algorithm you want to use to cluster the numerical data, Fuzzy K-means in this example.
-
From the Distance Measure list, select the distance measure you want to use for clustering.
-
In the Number of clusters field, enter 3.
-
Leave the values in Max iterations and Convergence delta as they are.
-