Reading and caching the sample data - 7.3

Machine Learning

Version
7.3
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Machine Learning components
Data Quality and Preparation > Third-party systems > Machine Learning components
Design and Development > Third-party systems > Machine Learning components
Last publication date
2024-02-21

Procedure

  1. Double-click the first tFileInputInput component to open its Component view.
  2. Click the [...] button next to Edit schema and in the pop-up schema dialog box, define the schema by adding two columns latitude and longitude of Double type.
  3. Click OK to validate these changes and accept the propagation prompted by the pop-up dialog box.
  4. Select the Define a storage configuration component check box and select the tHDFSConfiguration component to be used.
    tFileInputDelimited uses this configuration to access the sample data to be used as training set.
  5. In the Folder/File field, enter the directory where the training set is stored.
  6. Double-click the tReplicate component to open its Component view.
  7. Select the Cache replicated RDD check box and from the Storage level drop-down list, select Memory only. This way, this sample data is replicated and stored in memory for use as test set.