Creating and using metadata in Talend Studio - 8.0

Version
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for ESB
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development > Designing Jobs

Creating and using metadata in Talend Studio

In this tutorial, discover how creating and using metadata in Talend Studio can help you save a lot of development time.

This tutorial makes use of a .csv file. If you do not have a .csv file, click the Downloads tab and save customers_unordered.csv.

This tutorial makes use of a database. If you do not have a database, click the Downloads tab and save customers_unordered.sql. You must import the database into a compatible program.

Understanding metadata in Talend Studio

Talend Studio allows you to create and run Jobs using predefined components. You can configure each component as either a Built-in or a Repository component. A Repository component is saved as metadata.

For Built-in components, information (such as how to read the file and what it contains):
  • is defined in the component,
  • applies only to this component,
  • cannot be reused with any other component.
For Repository components, information (such as how to read the file and what it contains):
  • is saved as metadata,
  • can be efficiently and consistently reused,
  • can be easily maintained because changes to metadata can be propagated to all Jobs that use it.

Creating a Talend Studio project

Creating a project is the first step to using Talend Studio. Projects allow you to better organize your work.

Procedure

  1. Select Create a new project.
  2. Enter a name for your project.

    Example

    TalendDemo
  3. Click Create.
  4. Click Finish.

Results

Your project opens. You are ready to work in Talend Studio.

Creating a Job to use metadata

Talend Studio projects contain Jobs. In Jobs, you can build workflows through components, which allow you to complete specific actions.

Before you begin

Select the Integration perspective (Window > Perspective > Integration).

Procedure

  1. In Repository, right-click Job Designs.
    1. Click Create Standard Job.
  2. In the Name field, enter a name.

    Example

    useMetadata
  3. Optional: In the Purpose field, enter a purpose.

    Example

    Display the use of Metadata in Talend Studio
  4. Optional: In the Description field, enter a description.

    Example

    A simple job to demonstrate built-in properties vs. metadata
    Tip: Enter a Purpose and Description to stay organized.
  5. Click Finish.

Results

The Designer opens an empty Job.

Metadata configuration

By configuring metadata, you can configure reusable information across all your Talend Studio components.

Creating a metadata definition

Creating a metadata definition allows you to set up reusable information across all of your components.

Before you begin

  • This tutorial makes use of a .csv file. If you do not have a .csv file, click the Downloads tab and save customers_unordered.csv.

  • This tutorial also makes use of another delimited file. If you do not have another delimited file, click the Downloads tab and save directors.txt.

Procedure

  1. In the Repository, expand Metadata then right-click File delimited and click Create file delimited.
  2. In the Name field, enter a name.

    Example

    Customers
  3. Optional: In the Purpose field, enter a purpose.

    Example

    Creating reusable metadata thanks to a .csv file
  4. Optional: In the Description field, enter a description.

    Example

    Reusable shareable customer metadata
    Tip: Enter a Purpose and Description to stay organized.
  5. Click Next.
    You are brought to Step 2 in the wizard.
  6. Click Browse and select the file of your choice in the File Explorer.
  7. Click Next.
    You are brought to Step 3 in the wizard.
  8. Optional: Define the parse settings.

    Example

    • Under File Settings, select your Field Separator and change it, if needed.
      Note: The most common Field Separator is ;
    • Under Preview, select Set heading row as column names. The Header field automatically fills with the value 1, meaning the first row of your file has the file headers.
    Tip: Under Preview, click Refresh Preview to check the parsing results.
  9. Click Next.
    You are brought to Step 4 in the wizard.
  10. Optional: In the Name field, enter a name.

    Example

    customersSchema
  11. Optional: Update the Schema so it is identical to the structure of the sample file.

    Example

    Change the Number Type to String. A postcode number does not serve an arithmetic function.
  12. Click Finish.

Results

In the Repository, under Metadata, you can find and use your metadata.

Configuring a component through metadata

Configuring a component through metadata allows you to configure the component with predefined information.

Before you begin

You must have created a metadata definition (see Creating a metadata definition).

Procedure

  1. Click inside the Designer.
  2. Add a tFileInputDelimited component.
    Note: By default, the component is configured with Built-in parameters.
  3. Double-click the tFileInputDelimited component.
    1. In the Property Type list, select Repository.
    2. Click the […] button next to the Repository field.
    3. Under Metadata > File delimited, select a metadata definition.

      Example

      customers 0.1
      All the Component fields are filled in with the metadata information. They now appear in gray, indicating that they belong to the metadata, and not to the component.

Results

You have succesfully configured a component through metadata.

Creating a metadata definition from a database

In Talend Studio, you can fetch a metadata definition from a database, making the best use of your existing resources.

Before you begin

This tutorial makes use of a database. If you do not have a database, click the Downloads tab and save customers_unordered.sql. You must import the database into a compatible program.

Procedure

  1. In the Repository, expand Metadata then right-click Db Connections and click Create connection.
    You are brought to Step 1 in the wizard.
  2. In the Name field, enter a name.

    Example

    MySQL
  3. Optional: In the Purpose field, enter a purpose.

    Example

    Demonstrate how to fetch metadata from a database
  4. Optional: In the Description field, enter a description.

    Example

    Fetching CSV data imported into a MySQL database
    Tip: Enter a Purpose and Description to stay organized.
  5. Click Next.
    You are brought to Step 2 in the wizard.
  6. Enter your connection details.
    Tip: To check the connection to the database, click Test connection.
  7. Click Finish.
    You are brought to the Designer.
  8. Right-click your metadata in the Repository.
    1. Click Retrieve Schema.
      You are brought to the Filter for the Table window.
    2. Click Next.
      You are brought to the Add a Schema on repository window.
    3. Select the check box next to the table name.
    4. Click Next.
      The database schema displays.
    5. Optional: Change the Schema parameters.
    6. Click Finish.

Results

All table schemas have been imported as metadata and can be reused in your components.

Using metadata to read a database and displaying the results

You can read and manage databases in Talend Studio, allowing you to integrate them into your data management workflows.

Before you begin

  • This tutorial makes use of a database. If you do not have a database, click the Downloads tab and save customers_unordered.sql. You must import the database into a compatible program.

  • You must have created a metadata definition from a database (see Creating a metadata definition from a database).

Procedure

  1. In the Repository, expand Metadata > Db Connections.
  2. Drag-and-drop a database metadata on the Designer.
    1. Select a tDBInput component.
      The component inherits the database schema.
  3. In the Designer, add a tLogRow component.
  4. Right-click the tDBInput component.
    1. Select Row > Main.
    2. Click on the tLogRow component to link the two.
  5. Optional: In the tLogRow component, select the Table Mode.
  6. In the Run view, click Run.

Results

The tDBInput component, configured through metadata, reads your database, and the tLogRow component displays its content on the console.