Creating a training data schema reference - 7.3

Machine Learning

Version
7.3
Language
English (United States)
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Machine Learning components
Data Quality and Preparation > Third-party systems > Machine Learning components
Design and Development > Third-party systems > Machine Learning components
This sections explains how to create a training data schema reference to develop a machine learning routine.

Procedure

  1. Right-click the HDFS connection you previously created and choose Retrieve Schema.
  2. Navigate to the pre-loaded training data file located at /user/puccini/machinelearning/decisiontrees/marketing/marketing_campaign_train.csv.
  3. Click Next, name the schema and adjust the data types as needed.
    In this case, the defaults are accurate.
  4. Click Finish.
  5. Add a tHDFSConfiguration component to the palette.
  6. Set Property Type to Repository.
  7. Select the HDFS connection you created, MarketingCampaignData.