This section explains how to train your decision tree model.
Procedure
- Add a tDecisionTreeModel component to the workspace.
- Connect tModelEncoder to tDecisionTreeModel with a Main row.
- Double-click tDecisionTreeModel to open the Basic settings.
- In Storage, select the Define a storage configuration component check box and choose the HDFS storage.
- Choose the schema you created earlier.
- In Features Column, choose MyFeatures.
- In Label Column, choose MyLabels.
-
In Model location, select the Save the model on
file system (only for Spark 1.4 or higher) check box and enter the
path to the HDFS file system.
In this example: /user/puccini/machinelearning/decisiontrees/marketing/decisiontree.model.
-
Leave the default value for the rest of the settings.
Here is the Job configuration.
- Click the Run tab and go to Spark Configuration.
-
Select the Use local mode check box.
You can also run this Job directly on the Hadoop cluster, which is the most likely scenario in a production setting. For that, you need to make a few small adjustments to how the Job runs, including clearing the Use local mode check box.