Centralizing File Positional metadata - 6.2

Talend MDM Platform Studio User Guide

EnrichVersion
6.2
EnrichProdName
Talend MDM Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

If you often need to read data from and/or write data to certain positional files, you may want to centralize their metadata in the Repository for easy reuse. File Positional metadata can be used to define the properties of tFileInputPositional, tFileOutputPositional, and tFileInputMSPositional components.

Like the [New Delimited File] wizard, the [New Positional File] wizard gathers both file connection and schema definitions in a four-step procedure.

Note

The file schema creation is very similar for all types of file connections: Delimited, Positional, Regex, XML, or Ldif.

To create a File Positional connection from scratch, expand Metadata in the Repository tree view, right-click File positional and select Create file positional from the contextual menu to open the file metadata setup wizard.

To centralize a file connection and its schema you have defined in a Job, click the icon in the Basic settings view of the relevant component with its Property Type set to Built-in to open the file metadata setup wizard.

Then define the general properties and file schema in the wizard.

Defining the general properties

  1. In the file metadata setup wizard, fill in the Name field, which is mandatory, and the Purpose and Description fields if you choose to do so. The information you provide in the Description field will appear as a tooltip when you move your mouse pointer over the file connection.

  2. If needed, set the version and status in the Version and Status fields respectively. You can also manage the version and status of a Repository item in the [Project Settings] dialog box. For more information, see Version management and Status management respectively.

  3. If needed, click the Select button next to the Path field to select a folder under the File positional node to hold your newly created file connection. Note that you cannot select a folder if you are editing an existing connection, but you can drag and drop it to a new folder whenever you want.

  4. Click Next when completed with the general properties.

Defining the file path, format and marker positions

  1. Specify the full path of the source file in the File field, or click the Browse... button to search for the file.

    Note

    The Universal Naming Convention (UNC) path notation is not supported. If your source file is on a LAN host, you can first map the network folder into a local drive.

  2. Select the Encoding type and the OS Format the file was created in. This information is used to prefill subsequent step fields. If the list doesn't include the appropriate format, ignore the OS format.

    The file is loaded and the File Viewer area shows a file preview and allows you to place your position markers.

  3. Click the file preview and set the markers against the ruler to define the file column properties. The orange arrow helps you refine the position.

    The Field Separator and Marker Position fields are automatically filled with a series of figures separated by commas.

    The figures in the Field Separator are the number of characters between the separators, which represent the lengths of the columns of the loaded file. The asterisk symbol means all remaining characters on the row, starting from the preceding marker position. You can change the figures to specify the column lengths precisely.

    The Marker Position field shows the exact position of each marker on the ruler, in units of characters. You can change the figures to specify the positions precisely.

    To move a marker, press its arrow and drag it to the new position. To remove a marker, press its arrow and drag it towards the ruler until a icon appears.

  4. Click Next to continue.

Defining the file parsing parameters

On this view, you define the file parsing parameters so that the file schema can be properly retrieved.

At this stage, the preview shows the file columns upon the markers' positions.

  1. Set the Field and Row separators in the File Settings area.

    • If needed, change the figures in the Field Separator field to specify the column lengths precisely.

    • If the row separator of your file is not the standard EOL (end of line), select Custom String from the Row Separator list and specify the character string in the Corresponding Character field.

  2. If your file has any header rows to be excluded from the data content, select the Header check box in the Rows To Skip area and define the number of rows to be ignored in the corresponding field. Also, if you know that the file contains footer information, select the Footer check box and set the number of rows to be ignored.

  3. The Limit of Rows area allows you to restrict the extend of the file being parsed. If needed, select the Limit check box and set or select the desired number of rows.

  4. If the file contains column labels, select the Set heading row as column names check box to transform the first parsed row to labels for schema columns. Note that the number of header rows to be skipped is then incremented by 1.

  5. Click Refresh Preview on the Preview panel for the settings to take effect and view the result on the viewer.

  6. Click Next to proceed to the next view to check and customize the generated file schema.

Checking and customizing the file schema

Step 4 shows the end schema generated. Note that any character which could be misinterpreted by the program is replaced by neutral characters. Underscores replace asterisks, for example.

  1. Rename the schema (by default, metadata) and edit the schema columns as needed.

    Make sure the data type in the Type column is correctly defined.

    For more information regarding Java data types, including date pattern, see Java API Specification.

    Below are the commonly used Talend data types:

    • Object: a generic Talend data type that allows processing data without regard to its content, for example, a data file not otherwise supported can be processed with a tFileInputRaw component by specifying that it has a data type of Object.

    • List: a space-separated list of primitive type elements in an XML Schema definition, defined using the xsd:list element.

    • Dynamic: a data type that can be set for a single column at the end of a schema to allow processing fields as VARCHAR(100) columns named either as 'Column<X>' or, if the input includes a header, from the column names appearing in the header. For more information, see Dynamic schema.

    • Document: a data type that allows processing an entire XML document without regarding to its content.

  2. To generate the Positional File schema again, click the Guess button. Note that, however, any edits to the schema might be lost after "guessing" the file-based schema.

  3. When done, click Finish to close the wizard.

The new schema is displayed under the relevant File positional connection node in the Repository tree view. You can drop the defined metadata from the Repository onto the design workspace as a new component or onto an existing component to reuse the metadata. For further information about how to use the centralized metadata in a Job, see How to use centralized metadata in a Joband How to set a repository schema.

To modify an existing file connection, right-click it from the Repository tree view, and select Edit file positional to open the file metadata setup wizard.

To add a new schema to an existing file connection, right-click the connection from the Repository tree view and select Retrieve Schema from the contextual menu.

To edit an existing file schema, right-click the schema from the Repository tree view and select Edit Schema from the contextual menu.