Training the decision tree model

This section explains how to train your decision tree model.

Procedure

Add a tDecisionTreeModel component to the workspace.
Connect tModelEncoder to tDecisionTreeModel with a Main row.
Double-click tDecisionTreeModel to open the Basic settings.
In Storage, select the Define a storage configuration component check box and choose the HDFS storage.
Choose the schema you created earlier.
In Features Column, choose MyFeatures.
In Label Column, choose MyLabels.
In Model location, select the Save the model on file system (only for Spark 1.4 or higher) check box and enter the path to the HDFS file system.
In this example: /user/puccini/machinelearning/decisiontrees/marketing/decisiontree.model.
Leave the default value for the rest of the settings.

Here is the Job configuration.
Click the Run tab and go to Spark Configuration.
Select the Use local mode check box.

You can also run this Job directly on the Hadoop cluster, which is the most likely scenario in a production setting. For that, you need to make a few small adjustments to how the Job runs, including clearing the Use local mode check box.

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!