This sections explains how to create a training data schema reference to develop a machine learning routine.
- Right-click the HDFS connection you previously created and choose Retrieve Schema.
Navigate to the pre-loaded training data file located at /user/puccini/machinelearning/decisiontrees/marketing/marketing_campaign_train.csv.
Click Next, name the schema and adjust the data types as needed.
In this case, the defaults are accurate.
- Click Finish.
- Add a tHDFSConfiguration component to the palette.
- Set Property Type to Repository.
Select the HDFS connection you created, MarketingCampaignData.