Scenario: Reading full rows in a delimited file - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

The following scenario creates a two-component Job that aims at reading complete rows in the delimited file states.csv and displaying the rows on the console.

The content of the file states.csv that holds ten rows of data is as follows:

StateID;StateName
1;Alabama
2;Alaska
3;Arizona
4;Arkansas
5;California
6;Colorado
7;Connecticut
8;Delaware
9;Florida
10;Georgia
  1. Create a new Job and add a tFileInputFullRow component and a tLogRow component by typing their names in the design workspace or dropping them from the Palette.

  2. Link the tFileInputFullRow component to the tLogRow component using a Row > Main connection.

  3. Double-click the tFileInputFullRow component to open its Basic settings view on the Component tab.

  4. Click the [...] button next to Edit schema to view the data to be passed onto the tLogRow component. Note that the schema is read-only and it consists of only one column line.

  5. In the File Name field, browse to or enter the path to the file to be processed. In this scenario, it is E:/states.csv.

  6. In the Row Separator field, enter the separator used to identify the end of a row. In this example, it is the default value \n.

  7. In the Header field, enter 1 to skip the header row at the beginning of the file.

  8. Double-click the tLogRow component to open its Basic settings view on the Component tab.

    In the Mode area, select Table (print values in cells of a table) for better readability of the result.

  9. Press Ctrl+S to save your Job and then F6 to execute it.

    As shown above, ten rows of data in the delimited file states.csv are read one by one, ignoring field separators, and the complete rows of data are displayed on the console.

    To extract fields from rows, you must use tExtractDelimitedFields, tExtractPositionalFields, or tExtractRegexFields. For more information, see tExtractDelimitedFields, tExtractPositionalFields and tExtractRegexFields.