Reading and caching the sample data

Machine Learning

author
Talend Documentation Team
EnrichVersion
6.5
EnrichProdName
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
Talend Big Data
task
Data Quality and Preparation > Third-party systems > Machine Learning components
Data Governance > Third-party systems > Machine Learning components
Design and Development > Third-party systems > Machine Learning components
EnrichPlatform
Talend Studio

Procedure

  1. Double-click the first tFileInputInput component to open its Component view.
  2. Click the [...] button next to Edit schema and in the pop-up schema dialog box, define the schema by adding two columns latitude and longitude of Double type.
  3. Click OK to validate these changes and accept the propagation prompted by the pop-up dialog box.
  4. Select the Define a storage configuration component check box and select the tHDFSConfiguration component to be used.
    tFileInputDelimited uses this configuration to access the sample data to be used as training set.
  5. In the Folder/File field, enter the directory where the training set is stored.
  6. Double-click the tReplicate component to open its Component view.
  7. Select the Cache replicated RDD check box and from the Storage level drop-down list, select Memory only. This way, this sample data is replicated and stored in memory for use as test set.