Steps to use Semantic Discovery - 6.1

Talend Real-time Big Data Platform Studio User Guide

EnrichVersion
6.1
EnrichProdName
Talend Real-Time Big Data Platform
task
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

From the studio, you can use the Semantic Discovery feature to:

  • explore the semantic categories and query complex semantic relationships in the data you analyze,

  • create table analyses preconfigured with indicators and patterns that best suit the data.

  • index and enrich the Ontology repository on the log server with semantic categories and analysis results.

    For further information about dictionary indexes and regex categories embedded in the Studio, see the Knowledge Base article Dictionary indexes used in the Semantic Discovery analysis.

    For further information about the content of the Ontology repository, see the Knowledge Base article Accessing semantic concepts stored in the Ontology repository.

The sequence of using Semantic Discovery to create pre-configured table analyses involves the following steps:

  1. Connecting to a data source from the studio, whether it is a database, a delimited file or Hive.

    For further information, see Before you begin profiling data.

  2. Launching the log server where ontology indexes are stored.

    For further information, see Launching the server and setting preferences.

  3. Selecting a table in the data source or a view in a database connection and exploring semantic categories of data columns.

    You can also select to start a Semantic Discovery analysis on a set of columns in a table.

    For further information, see Exploring semantic categories of data columns.

  4. Matching column metadata and semantic categories with the concepts in the Ontology repository and outputting the matching results to show the most relevant concepts.

    For further information, see Matching column metadata and semantic categories with the concepts in the Ontology repository.

  5. Defining attributes (semantic) for columns and enriching the Ontology repository with column metadata and semantic categories.

    For further information, see Enriching the Ontology repository.

  6. Running the recommended table analysis and enriching the Ontology repository with analysis results and indicators and patterns used on the analyzed columns.

    For further information, see Defining the recommended table analysis.