Enabling the auto learning on data patterns - 8.0

Talend Data Catalog Administration Guide

Version
8.0
Language
English
Product
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Data Catalog
Content
Administration and Monitoring
Data Governance
Last publication date
2023-09-26

The data classification operation uses the data pattern to match data classes to imported objects based on the matching criteria.

When you approve or reject a learning data class, Talend Data Catalog absorbs the information and improve its understanding of the data pattern.

Before you begin

  • You have been assigned a global role with the Application Administration capability.
  • You have enabled the Auto Learning option in the data class properties.
  • You must have already sampled and profiled the data for the selected object.

Procedure

  1. Open the object page you want to use as a basis to learn from.
  2. Assign the data class manually to that object.
  3. Go to MANAGE > Data Classes to open the properties of the learning data class.
    If you see numbers in blue next to the values in the Data Pattern area, it means that they have learned.
    The numbers in blue next to the values are the percentage of instances of the data which matched that particular value, with a minimum of 10%.

    Talend Data Catalog picks up all the possible values or patterns that fit the percentage specified in the Matching threshold field.

    The data patterns which have the higher values in blue next to them are likely to be more accurate. You can adjust the list of possible values or patterns.
  4. Clear the Auto Learning check box to disable the option.
  5. Adjust the list of data patterns by removing the less accurate patterns.
  6. Save your changes.

Results

When you have a good set of patterns, you can invoke data classification on other objects to automatically associate the data class with these objects.