Training the decision tree model - Cloud - 8.0

Machine Learning

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Machine Learning components
Data Quality and Preparation > Third-party systems > Machine Learning components
Design and Development > Third-party systems > Machine Learning components
Last publication date
2024-02-20

This section explains how to train your decision tree model.

Procedure

  1. Add a tDecisionTreeModel component to the workspace.
  2. Connect tModelEncoder to tDecisionTreeModel with a Main row.
  3. Double-click tDecisionTreeModel to open the Basic settings.
  4. In Storage, select the Define a storage configuration component check box and choose the HDFS storage.
  5. Choose the schema you created earlier.
  6. In Features Column, choose MyFeatures.
  7. In Label Column, choose MyLabels.
  8. In Model location, select the Save the model on file system (only for Spark 1.4 or higher) check box and enter the path to the HDFS file system.
    In this example: /user/puccini/machinelearning/decisiontrees/marketing/decisiontree.model.
  9. Leave the default value for the rest of the settings.

    Here is the Job configuration.

  10. Click the Run tab and go to Spark Configuration.
  11. Select the Use local mode check box.
    You can also run this Job directly on the Hadoop cluster, which is the most likely scenario in a production setting. For that, you need to make a few small adjustments to how the Job runs, including clearing the Use local mode check box.