Scenario: Iterate on files and merge the content - 6.3

Talend Open Studio for Big Data Components Reference Guide

Talend Open Studio for Big Data
Talend Studio
Data Governance
Data Quality and Preparation
Design and Development

The following Job iterates on a list of files then merges their content and displays the final 2-column content on the console.

Dropping and linking the components

  1. Drop the following components onto the design workspace: tFileList, tFileInputDelimited, tUnite and tLogRow.

  2. Connect the tFileList to the tFileInputDelimited using an Iterate connection and connect the other component using a row main link.

Configuring the components

  1. In the tFileList Basic settings view, browse to the directory, where the files to merge are stored.

    The files are pretty basic and contain a list of countries and their respective score.

  2. In the Case Sensitive field, select Yes to consider the letter case.

  3. Select the tFileInputDelimited component, and display this component's Basic settings view.

  4. Fill in the File Name/Stream field by using the Ctrl+Space bar combination to access the variable completion list, and selecting tFileList.CURRENT_FILEPATH from the global variable list to process all files from the directory defined in the tFileList.

  5. Click the Edit Schema button and set manually the 2-column schema to reflect the input files' content.

    For this example, the 2 columns are Country and Points. They are both nullable. The Country column is of String type and the Points column is of Integer type.

  6. Click OK to validate the setting and accept to propagate the schema throughout the Job.

  7. Then select the tUnite component and display the Component view. Notice that the output schema strictly reflects the input schema and is read-only.

  8. In the Basic settings view of tLogRow, select the Table option to display properly the output values.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.

  2. Press F6, or click Run on the Run console to execute the Job.

    The console shows the data from the various files, merged into one single table.