Preparing the movies metadata - 7.3

Talend Data Fabric Getting Started Guide

Version
7.3
Language
English
Operating system
Data Fabric
Product
Talend Data Fabric
Module
Talend Administration Center
Talend DQ Portal
Talend Installer
Talend Runtime
Talend Studio
Content
Data Quality and Preparation > Cleansing data
Data Quality and Preparation > Profiling data
Design and Development
Installation and Upgrade
Last publication date
2023-07-24

This example describes how to set up the metadata of the source file movies.csv in the Repository. Repository metadata can be used across Jobs, allowing you to configure your Jobs quickly without having to define each parameter and schema manually.

Before you begin

  • You have the source file movies.csv ready in the directory C:\getting_started\input_data\.

Procedure

  1. In the Repository tree view, expand the Metadata node, right-click File delimited, and select Create file delimited from the contextual menu to open the New Delimited File wizard.
  2. In the New Delimited File wizard, enter a name for the file metadata, movies in this example, and other useful information to better describe your file metadata, and then click Next to go to the next step and define the general properties of the file.

    In this step of the wizard, Name is the only mandatory field. The information you provide in the Description field will appear as a tooltip when you move your mouse pointer over the file connection.

  3. In the File field specify the path of the source file, or click Browse to browse to the file.

    The file is loaded, and the File Viewer area displays an abstract of the file, allowing you to check the file consistency, the presence of header and more generally the file structure.

  4. From the Format list, select your operating system, and click Next to parse the file.
  5. On the Preview tab, select the Set heading row as column names check box to retrieve the file column names from the first row, and then click Refresh Preview.

    The file preview is refreshed, and the Header check box in the Rows To Skip area is automatically selected, with the number of header rows to be skipped incremented by 1.

  6. If the file contains more than one heading row, which need to be skipped in file parsing, specify the number in this field and click Refresh Preview again.
  7. Click Next to retrieve the file schema.

    The Description of the Schema table displays the generated file schema.

  8. Name the schema movies_schema and check the file schema and edit it according to your actual needs.

    In this example, increase the length of the title and url columns.

  9. Click Finish to validate the schema close the wizard.

    The created file metadata is shown in the Repository tree view.

Results

You now have the movies file metadata ready for use. Next, you need to apply the created metadata to the component that reads the source file.