Where to start? - Cloud - 8.0

Talend Studio User Guide

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development
Last publication date
2024-02-29
Available in...

Big Data Platform

Cloud API Services Platform

Cloud Big Data Platform

Cloud Data Fabric

Cloud Data Management Platform

Data Fabric

Data Management Platform

Data Services Platform

MDM Platform

Real-Time Big Data Platform

Talend Studio enables you to examine and collect statistics and information about the data available in database columns and in delimited files.

From the Profiling perspective, you can:

  • Design a column analysis from scratch and define the analysis settings manually.
  • Create column analyses automatically preconfigured with the indicators appropriate to the type you select.
  • Use the Semantic-aware Analysis wizard to automatically configure a column analysis based on information gathered in the semantic repository. For more information, see Steps to use the Semantic-aware analysis.

Procedure

  1. Create a column analysis:
    1. In the DQ Repository tree view, expand Data Profiling.
    2. Right-click the Analysis folder and select New Analysis.
    3. From the Column Analysis folder, select an option:
      Option Results
      Basic Column Analysis generate an empty column analysis where you can select the columns to analyze and manually assign the indicators on each column.

      For more information, see Creating a basic analysis on a database column.

      Discrete Data Analysis create a column analysis on numerical data preconfigured with the Bin Frequency and simple statistics indicators. You can then configure further the analysis or modify it in order to convert continuous data into discrete bins (ranges) according to your needs.

      For more information, see Analyzing discrete data.

      Nominal Values Analysis create a column analysis on nominal data preconfigured with indicators appropriate for nominal data, namely Value Frequency, Simple Statistics and Text Statistics indicators.

      For example results about these statistics, see Finalizing and executing the column analysis.

      Pattern Frequency Analysis create a column analysis preconfigured with the Pattern Frequency, Pattern Low Frequency and the row and null count indicators.

      This analysis can learn about patterns in your data. It shows frequent patterns and rare patterns so that you can identify quality issues more easily.

      For example results about these statistics, see Finalizing and executing the column analysis.

      Semantic Discovery Analysis create a column analysis preconfigured with indicators and patterns that best suite data after exploring the semantic categories of data columns and using related concepts from the semantic repository.

      For more information, see Steps to use the Semantic-aware analysis.

      Summary Statistics Analysis create a column analysis on numerical data preconfigured with the Summary Statistics and the row and null count indicators.

      This helps you to get a good idea of the shape of your numeric data by computing the range, the inter quartile range and the mean and median values.

      For an example of the use of Summary Statistics, see Setting system or user-defined indicators and Finalizing and executing the column analysis.

  2. Usually, the sequence of profiling data in one or multiple columns involves the following steps:
    1. Connecting to the data source. For more information, see Creating connections to data sources.
    2. Defining one or more columns on which to carry out data profiling processes.
      It will define the content, structure, and quality of the data included in the columns.
    3. Setting predefined system indicators or indicators defined by the user on the columns that need to be analyzed or monitored.
      These indicators will represent the results achieved through the implementation of different patterns.
    4. Adding to the column analyses the patterns against which you can define the content, structure, and quality of the data.
    5. Generating reports from these analyses and sharing the results among team members.
      These reports let you compare current and historical statistics to determine the improvement or degradation of data. For more information, see What are reports?.

What to do next

The Creating a basic analysis on a database column section explains the procedures to analyze the content of one or multiple columns in a database.

The Creating a basic column analysis on a file section explains the procedures to analyze columns in delimited files.

Talend Studio provides you with lock modes that allow you, if you are the first user to open an item, to lock that item and thus have the "read and write" rights. All other users who try to open the same item simultaneously will have a read-only access. For more information, see Lock principle.