Skip to main content Skip to complementary content

Creating a training data schema reference

This sections explains how to create a training data schema reference to develop a machine learning routine.

Procedure

  1. Right-click the HDFS connection you previously created and choose Retrieve Schema.
  2. Navigate to the pre-loaded training data file.
    In this example, /user/puccini/machinelearning/decisiontrees/marketing/marketing_campaign_train.csv.
  3. Click Next, name the schema and adjust the data types as needed.
    In this example, the defaults are accurate.
  4. Click Finish.
  5. Add a tHDFSConfiguration component to the workspace.
  6. Set Property Type to Repository.
  7. Select the HDFS connection you created, MarketingCampaignData.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!