Enabling the auto learning on data patterns - Cloud

Talend Cloud Data Catalog Administration Guide

Version
Cloud
Language
English (United States)
Product
Talend Cloud
Module
Talend Data Catalog
Content
Administration and Monitoring
Data Governance

The data classification operation uses the data pattern to match data classes to imported objects based on the matching criteria.

When you approve or reject a learning data class, Talend Cloud Data Catalog absorbs the information and improve its understanding of the data pattern.

Before you begin

  • You have been assigned a global role with the Application Administration capability.
  • You have enabled the Auto Learning option in the data class properties.
  • You must have already sampled and profiled the data for the selected object.

Procedure

  1. Open the details page of the object you want to use as a basis to learn from.
  2. Assign the data class manually to that object.
  3. Go to MANAGE > Data Classes to open the properties of the learning data class.
    If you see numbers in blue next to the values in the Data Pattern area, it means that they have learned.
    The numbers in blue next to the values are the percentage of instances of the data which matched that particular value, with a minimum of 10%.

    Talend Cloud Data Catalog picks up all the possible values or patterns that fit the percentage specified in the Matching threshold field.

    The data patterns which have the higher values in blue next to them are likely to be more accurate. You can adjust the list of possible values or patterns.
  4. Clear the Auto Learning check box to disable the option.
  5. Adjust the list of data patterns by removing the less accurate patterns.
  6. Save your changes.

Results

When you have a good set of patterns, you can invoke data classification on other objects to automatically associate the data class with these objects.