This sections explains how to create a training data schema reference to develop a machine learning routine.
Procedure
-
Right-click the HDFS connection you previously created and choose
Retrieve Schema.
-
Navigate to the pre-loaded training data file located at /user/puccini/machinelearning/decisiontrees/marketing/marketing_campaign_train.csv.
-
Click Next, name the schema and adjust the data types as needed.
In this case, the defaults are accurate.
-
Click Finish.
-
Add a tHDFSConfiguration component to the palette.
-
Set Property Type to Repository.
-
Select the HDFS connection you created, MarketingCampaignData.