Training the model using Random Forest - 6.5

Machine Learning

author
Talend Documentation Team
EnrichVersion
6.5
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Machine Learning components
Data Quality and Preparation > Third-party systems > Machine Learning components
Design and Development > Third-party systems > Machine Learning components
EnrichPlatform
Talend Studio

Procedure

  1. Double-click tRandomForestModel to open its Component view.
  2. From the Label column list, select the column that provides the classes to be used for classification. In this scenario, it is label, which contains two class names: spam for junk messages and ham for normal messages.
  3. From the Features column list, select the column that provides the feature vectors to be analyzed. In this scenario, it is features_vect, which combines all features.
  4. Select the Save the model on file system check box and in the HDFS folder field that is displayed, enter the directory you want to use to store the generated model.
  5. In the Number of trees in the forest field, enter the number of decision trees you want tRandomForestModel to build. You need to try different numbers to run the current Job to create the classification model several times; after comparing the evaluation results of every model created on each run, you can decide the number you need to use. In this scenario, put 20.
    An evaluation Job will be presented in one of the following sections.
  6. Leave the other parameters as is.