Scenario 2: Handling a positional file based on a dynamic schema - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

This scenario describes a four-component Job that reads data from a positional file, writes the data to another positional file, and replaces the padding characters with space. The schema column details are not defined in the positional file components; instead, they leverages a reusable dynamic schema. The input file used in this scenario is as follows:

id----name--------city--------
1-----Andrews-----Paris-------
2-----Mark--------London------
3-----Marie-------Paris-------
4-----Michael-----Washington--

Dropping and linking components

  1. Drop the following components from the Palette onto the design workspace: tFixedFlowInput, tSetDynamicSchema, tFileInputPositional, and tFileOutputPositional.

  2. Connect the tFixedFlowInput component to the tSetDynamicSchema using a Row > Main connection to form a subjob. This subjob will define a reusable dynamic schema.

  3. Connect the tFileInputPositional component to the tFileOutputPositional component using a Row > Main connection to form another subjob. This subjob will read data from the input positional file and write the data to another positional file based on the dynamic schema set in the previous subjob.

  4. Connect the tFixedFlowInput component to the tFileInputPositional component using a Trigger > On Subjob Ok connection to link the two subjobs together.

Configuring the first subjob: creating a dynamic schema

  1. Double-click the tFixedFlowInput component to show its Basic settings view and define its properties.

  2. Click the [...] button next to Edit schema to open the [Schema] dialog box.

  3. Click the [+] button to add three columns: ColumnName, ColumnType, and ColumnLength, and set their types to String, String, and Integer respectively to define the minimum properties required for a positional file schema. Then, click OK to close the dialog box.

  4. Select the Use Inline Table option, click the [+] button three times to add three lines, give them a name in the ColumnName field, according to the actual columns of the input file to read: ID, Name, and City, set their types in the corresponding ColumnType field: id_Interger for column ID, and id_String for columns Name and City, and set the length values of the columns in the corresponding ColumnLength field. Note that the column names you give in this table will compose the header of the output file.

  5. Double-click the tSetDynamicSchema component to open its Basic settings view.

  6. Click Sync columns to ensure that the schema structure is properly retrieved from the preceding component.

  7. Under the Parameters table, click the [+] button to add three lines in the table.

  8. Click in the Property field for each line, and select ColumnName, Type, and Length respectively.

  9. Click in the Value field for each line, and select ColumnName, ColumnType, and ColumnLength respectively.

    Now, with the values set in the inline table of the tFixedFlowInput component retrieved, the following data structure is defined in the dynamic schema:

    Column NameTypeLength
    IDInteger6
    NameString12
    CityString12

Configuring the second subjob: reading and writing positional data

  1. Double-click the tFileInputPositional component to open its Basic settings view.

    Warning

    The dynamic schema feature is only supported in Built-In mode and requires the input file to have a header row.

  2. Select the Use existing dynamic check box, and in from the Component List that appears, select the tSetDynamicSchema component you use to create the dynamic schema. In this use case, only one tSetDynamicSchema component is used, so it is automatically selected.

  3. In the File name/Stream field, enter the path to the input positional file, or browse to the file path by clicking the [...] button.

  4. Fill in the Header, Footer and Limit fields according to your input file structure and your need. In this scenario, we only need to skip the first row when reading the input file. To do this, fill the Header field with 1 and leave the other fields as they are.

  5. Click the [...] button next to Edit schema to open the [Schema] dialog box, define only one column, dyn in this example, and select Dynamic from the Type list. Then, click OK to close the [Schema] dialog box and propagate the changes.

  6. Select the Customize check box, enter '-' in the Padding char field, and keep the other settings as they are.

  7. Double-click the tFileOutputPositional component to open its Basic settings view.

  8. Select the Use existing dynamic check box, specify the output file path, and select the Include header check box.

  9. In the Padding char field, enter ' ' so that the padding characters will be replaced with space in the output file.

Saving and executing the Job

  1. Press Ctrl+S to save your Job to ensure that all the configured parameters take effect.

  2. Press F6 or click Run on the Run tab to execute the Job.

    The data is read from the input positional file and written into the output positional file, with the padding characters replaced by space.