Skip to main content Skip to complementary content

Semantic Flow

Semantic Definition Lookup

In this use case, one has found a data element (a column in a table in a database for example, or a field in a report) and wants to understand what it means. By defining the semantic links properly, Talend Data Catalog can trace back through the physical data flow (as long as there is no transformation which would change the meaning) to an element that is mapped to a term in the glossary and thus find a useful definition.

The caveat that the above only works “as long as there is no transformation which would change the meaning” implies that some subset of the fields in your reports will not provide a semantic definition. The trace will simply stop at the transformation and never get to a model (again likely the data warehouse) that has semantic lineage.

So, in addition to this method of “trace through the dataflow as long as there is no transformation which would change the meaning”, there is another which is search based or name matching based. In this case, if there is a field in a report named “Net Account Amount” and it does not have a good data flow trace without transformation, one could still create a term in the glossary named “Net Account Amount”. When requesting a data element definition lookup in that case, Talend Data Catalog will perform a search for that term and report its definition, even without a clean lineage trace. In most case, it will be necessary to fill in the blanks in some of these cases by adding terms to the glossary.

Of course, it is quite possible that no term directly matches the report field by name. In this case, one may define a direct object relationship like a term classification from a term in the glossary to the field in the report. The advantage of this approach is that one may control precisely what the preferred definition will be versus the name matching method. Also, it provides a definition, even though there may not be a data flow trace that does not contain transformations. Hence, it is the preferred method for fields for which there is no equivalent in the warehouse or lake (i.e., calculated in the report) and there is no term or multiple terms that match by name.

Information note

All these types of semantic definitions can be turned on/off in the customized presentation UI semantic usage widget, meaning the users can select what kind of semantic definition they want to see on the Overview page when you have customized it to show the widget, but not what is used for Documentation (Name and Business Definition).

To summarize, there are seven methods used to provide an answer to a definition lookup.

The preference for which result is used is based upon a ranking system that is in descending order in the list above. Thus, a DOCUMENTED result gets preference over CLASSIFIED, etc, for the Name and Business Definition.

Example

Navigate to AccountAmountAvailable, which is a column in the GLAccount table in the Dimensional DW in the demo.

Information note

The Documentation including Name and Business Definition for the view column is already populated. It was determined based upon a Term definition.

Click on the Semantic Flow tab, go to the List tab on the left, and you see the inferred semantic definition from a term in the Glossary:

Then click the Diagram tab on the left and you see the actual semantic lineage trace that got to the term.

Click Columns and select the specific column to show in the trace:

Information note

The actual result used for the Name and Business Definition is the Is Defined By associated term. This is based upon a ranking system that is in descending order in the list of types. Thus, a Documented result gets preference over Term Defined, etc., for the Name and Business Definition.

Semantic Usage

In this scenario, one may wish to see the usage of the semantic element (e.g., glossary term) in the architecture.

In this scenario, from a glossary term or conceptual/logical model element one may wish to simply discover what data element are semantically mapped in the data flow architecture and thus would be impacted by a change to the term or model element.

A semantic usage lineage trace is nearly the reverse of semantic definition lookup. In general it is requested from a term’s or logical model element’s object page. The usage trace itself proceeds down each semantic link and then traces the data flow where there are no transformations (pass-through lineage) to all objects which may be reached in this manner.

Example

Navigate to the term Account Amount Available in the Finance Glossary. Then click the Semantic Flow tab and List tab on the left.

You see a list of all the objects that express the term Account Amount Available, whether by semantic mapping including term mapping link (Mapped Semantically), by inferred equivalence due to pass-through lineage in the data flow (Inferred from Lineage), by inferred equivalence due to data classification (Inferred from Classification), or the result of a name matching search (Searched).

Click the Diagram tab to see these traces:

Semantic Relationship Types

Semantic and data flow lineage traces report a number of elements in the semantic definition lookup and usage reports.

For the inferred results, priority is given to certain types of objects, in this order from highest to lowest:

  • Term
  • Data model (e.g., and erwin or ER/Studio model) object
  • Other objects.

Finally, given multiple of the same results:

  • E.g., three inferred terms - priority is given to a term which is exactly adjacent (directly mapped/classified)
  • E.g., several objects in the data flow with pass-through lineage, priority is given to the object which is directly adjacent in the data flow.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!