Scenario: Regex to Positional file - 6.3

Talend Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

The following scenario creates a two-component Job, reading data from an Input file using regular expression and outputting delimited data into an XML file.

Dropping and linking the components

  1. Drop a tFileInputRegex component from the Palette to the design workspace.

  2. Drop a tFileOutputPositional component the same way.

  3. Right-click on the tFileInputRegex component and select Row > Main. Drag this main row link onto the tFileOutputPositional component and release when the plug symbol displays.

Configuring the components

  1. Select the tFileInputRegex again so the Component view shows up, and define the properties:

  2. The Job is built-in for this scenario. Hence, the Properties are set for this station only.

  3. Fill in a path to the file in File Name field. This field is mandatory.

  4. Define the Row separator identifying the end of a row.

  5. Then define the Regular expression in order to delimit fields of a row, which are to be passed on to the next component. You can type in a regular expression using Java code, and on mutiple lines if needed.

    Warning

    Regex syntax requires double quotes.

  6. In this expression, make sure you include all subpatterns matching the fields to be extracted.

  7. In this scenario, ignore the header, footer and limit fields.

  8. Select a local (Built-in) Schema to define the data to pass on to the tFileOutputPositional component.

  9. You can load or create the schema through the Edit Schema function.

  10. Then define the second component properties:

  11. Enter the Positional file output path.

  12. Enter the Encoding standard, the output file is encoded in. Note that, for the time being, the encoding consistency verification is not supported.

  13. Select the Schema type. Click on Sync columns to automatically synchronize the schema with the Input file schema.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.

  2. Now go to the Run tab, and click on Run to execute the Job.

    The file is read row by row and split up into fields based on the Regular Expression definition. You can open it using any standard file editor.