Ontologies used in the studio - 6.2

Talend MDM Platform Studio User Guide

EnrichVersion
6.2
EnrichProdName
Talend MDM Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

What is an Ontology:

An Ontology is a description of the concepts and attributes and the relationships that can exist for data in multiple columns. For example, a customer column is the concept and date of birth and name are the attributes of the concept. An Ontology lists concepts, attributes and synonyms of the attributes.

What an Ontology is used for in the studio:

Using the Ontology repository stored on the log server with the studio enables knowledge sharing by re-using indicators and patterns that are already analyzed and seen to best suit the type of data you analyze.

The studio analyzes column content based on a set of methods (regex, data dictionary and keyword dictionary) and then decides what category does the data fall in. For example, for data like:

  • user@talend.com, the studio analyzes it against a regex and find it to be an EMAILADDRESS,

  • John, the studio analyzes it against the data dictionary and find it to be FIRSTNAME,

  • 43 Chester Road, the studio analyzes the tokens in the data string against keywords in the dictionary and find Road to be an ADDRESSLINE.

For further information about dictionary indexes and regex categories embedded in the Studio, see the Knowledge Base article Indexes and regex categories used in the Semantic-aware analysis.

What Ontologies are used in the studio:

An Ontology has been built on the log server by merging different business standards: UBL and OAGI:

  • Universal Business Language (UBL): An OASIS effort to create a synthesis of existing XML business document libraries into one universal business language.

  • Open Application Group (OAGI): OAGI defines a common content model and common messages for communication between business applications.

The final outcome of the merge is 412 concepts that apply on several domains including: customer, company, geography, product, finance, etc.

For further information about the content of the Ontology repository, see the Knowledge Base article Accessing semantic concepts stored in the Ontology repository.