Skip to main content Skip to complementary content

Reading and caching the sample data

Procedure

  1. Double-click the first tFileInputInput component to open its Component view.
  2. Click the [...] button next to Edit schema and in the pop-up schema dialog box, define the schema by adding two columns latitude and longitude of Double type.
  3. Click OK to validate these changes and accept the propagation prompted by the pop-up dialog box.
  4. Select the Define a storage configuration component check box and select the tHDFSConfiguration component to be used.
    tFileInputDelimited uses this configuration to access the sample data to be used as training set.
  5. In the Folder/File field, enter the directory where the training set is stored.
  6. Double-click the tReplicate component to open its Component view.
  7. Select the Cache replicated RDD check box and from the Storage level drop-down list, select Memory only. This way, this sample data is replicated and stored in memory for use as test set.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!