What is an ontology?
An ontology is a description of the concepts and attributes and the relationships that can exist for data in multiple columns. For example, a customer column is the concept and date of birth and name are the attributes of the concept. An ontology lists concepts, attributes and synonyms of the attributes.
What an ontology is used for in the studio?
Using the ontology repository stored on the log server with the studio enables knowledge sharing by re-using indicators and patterns that are already analyzed and seen to best suit the type of data you analyze.
Talend Studio analyzes column content based on a set of methods (regex, data dictionary and keyword dictionary) and then decides what category does the data fall in. For example, for data like:
- email@example.com, Talend Studio analyzes it against a regex and find it to be an EMAILADDRESS,
- John, Talend Studio analyzes it against the data dictionary and find it to be FIRSTNAME,
- 43 Chester Road, Talend Studio analyzes the tokens in the data string against keywords in the dictionary and find Road to be an ADDRESSLINE.
For a list of all the dictionary indexes and regex categories used in the Semantic-aware analysis, see List of the indexes and regex categories used in the Semantic-aware analysis.
What ontologies are used in the studio?
An ontology has been built on the log server by merging different business standards, UBL and OAGI:
- Universal Business Language (UBL): An OASIS effort to create a synthesis of existing XML business document libraries into one universal business language.
- Open Application Group (OAGI): OAGI defines a common content model and common messages for communication between business applications.
The final outcome of the merge is 412 concepts that apply on several domains including: customer, company, geography, product, finance, etc.
For further information about the content of the ontology repository, see The ontology repository.