This section explains how to train your decision tree model.
Procedure
- Add a tDecisionTreeModel component to the palette.
- Connect tModelEncouder to tDecisionTreeModel with a Main.
- Double-click tDecisionTreeModel and choose the Component view.
- Select the check box below Storage to choose HDFS storage.
- Choose the schema you created earlier.
- In Features Column, choose MyFeatures.
- In Label Column, choose MyLabels.
- Select the check box below Model location and save the HDFS file system at /user/puccini/machinelearning/decisiontrees/marketing/decisiontree.model.
-
Leave the default value for the rest of the settings.
Your final Job should look as follows.
- Click the Run tab and go to Spark Configuration.
-
Select the Use local mode check box.
You can also run this Job directly on the Hadoop cluster, which is the most likely scenario in a production setting. For that, you need to make a few small adjustments to how the Job runs, including clearing the Use local mode check box.