Scenario: Display the content of a ARFF file - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

This scenario describes a two-component Job in which the rows of an ARFF file are read, the delimited data is selected and the output is displayed in the Run view.

An ARFF file looks like the following:

It is generally made of two parts. The first part describes the data structure, that is to say the rows which begin by @attribute and the second part comprises the raw data, which follows the expression @data.

Dropping and linking components

  1. Drop the tFileInputARFF component from the Palette onto the workspace.

  2. In the same way, drop the tLogRow component.

  3. Right-click the tFileInputARFF and select Row > Main in the menu. Then, drag the link to the tLogRow, and click it. The link is created and appears.

Configuring the components

  1. Double-click the tFileInputARFF.

  2. In the Component view, in the File Name field, browse your directory in order to select your .arff file.

  3. In the Schema field, select Built-In.

  4. Click the [...] button next to Edit schema to add column descriptions corresponding to the file to be read.

  5. Click on the button as many times as required to create the number of columns required, according to the source file. Name the columns as follows.

  6. For every column, the Nullable check box is selected by default. Leave the check boxes selected, for all of the columns.

  7. Click OK.

  8. In the workspace, double-click the tLogRow to display its Component view.

  9. Click the [...] button next to Edit schema to check that the schema has been propagated. If not, click the Sync columns button.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.

  2. Press F6 to execute your Job.

    The console displays the data contained in the ARFF file, delimited using a vertical line (the default separator).