Data lineage - Cloud

Talend Cloud API Services Platform Studio User Guide

author
Talend Documentation Team
EnrichVersion
Cloud
EnrichProdName
Talend Cloud
task
Design and Development
EnrichPlatform
Talend Management Console
Talend Studio

Data lineage shows the data flow from the data destination (output component), through various components and stages, to the data source (input component). The data lineage results trace the life cycle of the data flow between different components, including the operations that are performed upon the data.

Talend Studio also allows you to produce detail documentation in HTML and XML of the results of the data lineage. For more information, see Exporting the results of impact analysis/data lineage to HTML and Exporting the results of impact analysis/data lineage to XML.

Warning: All items on which you want to execute impact analysis or data lineage must be centralized in the Repository tree view under any of the following nodes: Joblets Designs, Contexts , SQL Templates , Reference project or Metadata.

The example below shows the data lineage made on a database connection item stored under the Metadata node in the Repository tree view.

To launch a data lineage on a metadata item, complete the following:

Procedure

  1. In the Repository tree view, expand Metadata > Db Connection and then expand the database connection you want to analyze, mysql in this example.
  2. Right-click the centralized table schema of which you want to analyze the life cycle of the data flow, employees in this example.
    The Impact Analysis view displays the Jobs that use the selected table schema. The names of the selected database connection and table schema are displayed in the corresponding fields.
  3. From the Column list, select the column name for which you want to analyze the data flow from the data destination (output component), through various components and stages, to the data source (input component). The column to be analyzed in this example is called Name.
    You can skip this step by right-clicking the column Name in the Repository tree view and selecting Impact Analysis from the contextual menu.
  4. Click Data Lineage.
    A bar appears to indicate the progress of the analysis operation and the analysis results are displayed in the view.
  5. Right-click a listed Job and select Open Job from the contextual menu.
    The Job opens in the design workspace.
    The data lineage results trace backward the components and transformations the data in the output column Name passes through before being written in this column.