Scenario: Transforming a list of files as data flow - 6.1

Talend Open Studio for Big Data Components Reference Guide

Talend Open Studio for Big Data
Data Governance
Data Quality and Preparation
Design and Development
Talend Studio

The following scenario describes a Job that iterates on a list of files, picks up the filename and current date and transforms this into a flow, that gets displayed on the console.

  • Drop the following components: tFileList, tIterateToFlow and tLogRow from the Palette to the design workspace.

  • Connect the tFileList to the tIterateToFlow using an iterate link and connect the Job to the tLogRow using a Row main connection.

  • In the tFileList Component view, set the directory where the list of files is stored.

  • In this example, the files are three simple .txt files held in one directory: Countries.

  • No need to care about the case, hence clear the Case sensitive check box.

  • Leave the Include Subdirectories check box unchecked.

  • Then select the tIterateToFlow component et click Edit Schema to set the new schema

  • Add two new columns: Filename of String type and Date of date type. Make sure you define the correct pattern in Java.

  • Click OK to validate.

  • Notice that the newly created schema shows on the Mapping table.

  • In each cell of the Value field, press Ctrl+Space bar to access the list of global and user-specific variables.

  • For the Filename column, use the global variable: tFileList_1CURRENT_FILEPATH. It retrieves the current filepath in order to catch the name of each file, the Job iterates on.

  • For the Date column, use the Talend routine:TalendDate.getCurrentDate() (in Java)

  • Then on the tLogRow component view, select the Print values in cells of a table check box.

  • Save your Job and press F6 to execute it.

The filepath displays on the Filename column and the current date displays on the Date column.