Training the decision tree model - 7.3

Machine Learning

Version
7.3
Language
English (United States)
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Machine Learning components
Data Quality and Preparation > Third-party systems > Machine Learning components
Design and Development > Third-party systems > Machine Learning components
This section explains how to train your decision tree model.

Procedure

  1. Add a tDecisionTreeModel component to the palette.
  2. Connect tModelEncouder to tDecisionTreeModel with a Main.
  3. Double-click tDecisionTreeModel and choose the Component view.
  4. Select the check box below Storage to choose HDFS storage.
  5. Choose the schema you created earlier.
  6. In Features Column, choose MyFeatures.
  7. In Label Column, choose MyLabels.
  8. Select the check box below Model location and save the HDFS file system at /user/puccini/machinelearning/decisiontrees/marketing/decisiontree.model.
  9. Leave the default value for the rest of the settings.

    Your final job should look as follows.

  10. Click the Run tab and go to Spark Configuration.
  11. Select the Use local mode check box.
    You can also run this job directly on the Hadoop cluster, which is the most likely scenario in a production setting. For that, you need to make a few small adjustments to how the job runs, including clearing the Use local mode check box.