This sections explains how to create a training data schema reference to develop a machine learning routine.
Procedure
- Right-click the HDFS connection you previously created and choose Retrieve Schema.
-
Navigate to the pre-loaded training data file.
In this example, /user/puccini/machinelearning/decisiontrees/marketing/marketing_campaign_train.csv.
-
Click Next, name the schema and adjust the data types as needed.
In this example, the defaults are accurate.
- Click Finish.
- Add a tHDFSConfiguration component to the workspace.
- Set Property Type to Repository.
-
Select the HDFS connection you created, MarketingCampaignData.