Scenario: Reading full rows in a delimited file - 6.1

Talend Components Reference Guide

Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
Talend Studio
Data Governance
Data Quality and Preparation
Design and Development

The following scenario creates a two-component Job that aims at reading complete rows in the delimited file states.csv and displaying the rows on the console.

The content of the file states.csv that holds ten rows of data is as follows:

  1. Create a new Job and add a tFileInputFullRow component and a tLogRow component by typing their names in the design workspace or dropping them from the Palette.

  2. Link the tFileInputFullRow component to the tLogRow component using a Row > Main connection.

  3. Double-click the tFileInputFullRow component to open its Basic settings view on the Component tab.

  4. Click the [...] button next to Edit schema to view the data to be passed onto the tLogRow component. Note that the schema is read-only and it consists of only one column line.

  5. In the File Name field, browse to or enter the path to the file to be processed. In this scenario, it is E:/states.csv.

  6. In the Row Separator field, enter the separator used to identify the end of a row. In this example, it is the default value \n.

  7. In the Header field, enter 1 to skip the header row at the beginning of the file.

  8. Double-click the tLogRow component to open its Basic settings view on the Component tab.

    In the Mode area, select Table (print values in cells of a table) for better readability of the result.

  9. Press Ctrl+S to save your Job and then F6 to execute it.

    As shown above, ten rows of data in the delimited file states.csv are read one by one, ignoring field separators, and the complete rows of data are displayed on the console.

    To extract fields from rows, you must use tExtractDelimitedFields, tExtractPositionalFields, or tExtractRegexFields. For more information, see tExtractDelimitedFields, tExtractPositionalFields and tExtractRegexFields.