Tracing data flow lineage - 8.0

Talend Data Catalog User Guide

Version
8.0
Language
English
Product
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Data Catalog
Content
Data Governance
Last publication date
2023-09-26
The data flow lineage feature allows you to narrow in on specific objects and shows you how these objects are related to each other, within a model, an external metadata repository, or a configuration. The data flow lineage is based upon connection definitions to data stores and physical transformation rules which transform and move the data.
There are three types of data flow analysis:
  • Data lineage applies upstream in the data flow. It refers to as a reverse lineage question, such as asking where the information comes from.
  • Data impact applies downstream in the data flow. It refers to as a forward lineage question, such as asking what will be impacted by a change.
  • Full data lineage applies both upstream and downstreamn in the data flow. It refers to the two previous types simultaneously.
You may use the Data Flow tab for different use cases and scope:
  • You may invoke a lineage and/or impact trace by going to the Data Flow tab or context menu from a classifier (table, file, entity, and so on) or feature (column, field, attribute, and so on) which will present an end-to-end trace across all the models and mappings in your current configuration.
  • You may invoke a lineage overview by going to the Data Flow tab from the detail page for a model, schema, ETL job, BI design, and so on, which will present lineage within the model, even without stitching them to other models.

A data flow lineage trace presents summary lineage as opposed to the data flow overview lineage which presents detailed transformation lineage. When you trace impact or lineage of a table or column, you do not see all the transformations. Instead, you see a summary of the whole job.

Constants are not displayed on the lineage diagram. If a constant appears as a source for lineage and the process only has that constant as a source for a lineage trace, you will not see that process in the lineage trace.