Creating a training data schema reference - Cloud - 8.0

Machine Learning

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Machine Learning components
Data Quality and Preparation > Third-party systems > Machine Learning components
Design and Development > Third-party systems > Machine Learning components
Last publication date
2024-02-20

This sections explains how to create a training data schema reference to develop a machine learning routine.

Procedure

  1. Right-click the HDFS connection you previously created and choose Retrieve Schema.
  2. Navigate to the pre-loaded training data file.
    In this example, /user/puccini/machinelearning/decisiontrees/marketing/marketing_campaign_train.csv.
  3. Click Next, name the schema and adjust the data types as needed.
    In this example, the defaults are accurate.
  4. Click Finish.
  5. Add a tHDFSConfiguration component to the workspace.
  6. Set Property Type to Repository.
  7. Select the HDFS connection you created, MarketingCampaignData.