Scenario 1: From Positional to XML file - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

The following scenario describes a two-component Job, which aims at reading data from an input file that contains contract numbers, customer references, and insurance numbers as shown below, and outputting the selected data (according to the data position) into an XML file.

Contract       CustomerRef    InsuranceNr
00001          8200           50330      
00001          8201           50331      
00002          8202           50332      
00002          8203           50333      

Dropping and linking components

  1. Drop a tFileInputPositional component from the Palette to the design workspace.

  2. Drop a tFileOutputXML component as well. This file is meant to receive the references in a structured way.

  3. Right-click the tFileInputPositional component and select Row > Main. Then drag it onto the tFileOutputXML component and release when the plug symbol shows up.

Configuring data input

  1. Double-click the tFileInputPositional component to show its Basic settings view and define its properties.

  2. Define the Job Property type if needed. For this scenario, we use the built-in Property type.

    As opposed to the Repository, this means that the Property type is set for this station only.

  3. Fill in a path to the input file in the File Name field. This field is mandatory.

  4. Define the Row separator identifying the end of a row if needed, by default, a carriage return.

  5. If required, select the Use byte length as the cardinality check box to enable the support of double-byte character.

  6. Define the Pattern to delimit fields in a row. The pattern is a series of length values corresponding to the values of your input files. The values should be entered between quotes, and separated by a comma. Make sure the values you enter match the schema defined.

  7. Fill in the Header, Footer and Limit fields according to your input file structure and your need. In this scenario, we only need to skip the first row when reading the input file. To do this, fill the Header field with 1 and leave the other fields as they are.

  8. Next to Schema, select Repository if the input schema is stored in the Repository. In this use case, we use a Built-In input schema to define the data to pass on to the tFileOutputXML component.

  9. You can load and/or edit the schema via the Edit Schema function. For this schema, define three columns, respectively Contract, CustomerRef and InsuranceNr matching the structure of the input file. Then, click OK to close the [Schema] dialog box and propagate the changes.

Configuring data output

  1. Double-click tFileOutputXML to show its Basic settings view.

  2. Enter the XML output file path.

  3. Define the row tag that will wrap each row of data, in this use case ContractRef.

  4. Click the three-dot button next to Edit schema to view the data structure, and click Sync columns to retrieve the data structure from the input component if needed.

  5. Switch to the Advanced settings tab view to define other settings for the XML output.

  6. Click the plus button to add a line in the Root tags table, and enter a root tag (or more) to wrap the XML output structure, in this case ContractsList.

  7. Define parameters in the Output format table if needed. For example, select the As attribute check box for a column if you want to use its name and value as an attribute for the parent XML element, clear the Use schema column name check box for a column to reuse the column label from the input schema as the tag label. In this use case, we keep all the default output format settings as they are.

  8. To group output rows according to the contract number, select the Use dynamic grouping check box, add a line in the Group by table, select Contract from the Column list field, and enter an attribute for it in the Attribute label field.

  9. Leave all the other parameters as they are.

Saving and executing the Job

  1. Press Ctrl+S to save your Job to ensure that all the configured parameters take effect.

  2. Press F6 or click Run on the Run tab to execute the Job.

    The file is read row by row based on the length values defined in the Pattern field and output as an XML file as defined in the output settings. You can open it using any standard XML editor.