Managing data classes - Cloud

Talend Cloud Data Catalog Administration Guide

Version
Cloud
Language
English (United States)
Product
Talend Cloud
Module
Talend Data Catalog
Content
Administration and Monitoring
Data Governance

Data classification helps you to detect, understand and classify the nature and purpose of the elements contained in the data sources imported in your catalog.

You can classify imported objects with glossary terms to define these technical elements in business terms that everyone can understand. Data classification can also help you to find hidden relationships between these objects.

Talend Cloud Data Catalog helps you to automate the identification and data classification process using the data profiling capability and the data classes. It allows you to protect sensitive data automatically.

You can see and manage the existing data classes and create new ones from MANAGE > Data Classes.

Types of data classes

Talend Cloud Data Catalog helps you to identify and classify the sensitive data (also referred to as PII) automatically.
  • Data-detected classes detect the nature of data automatically based on predefined enumeration, patterns and regular expressions. The data-detected classification uses the data sampling and profiling capability.
  • Metadata-detected classes detect classes by metadata attributes. They help you to detect data that cannot be identified with the data-detected classification, such as date of birth which do not have unique data patterns. The metadata-detected classification is powered by the MQL capability.
  • Compound classes are based on multiple metadata-detected and data-detected classes.

You can use these data classes to profile and match the criteria by which sensitive data are hidden. Data and metadata-detected classes share the same infrastructure for PII and data hiding.

Data-detected and Metadata-detected classifications

The data-detected classification detects common data patterns automatically. It is less focused on providing definitions.

The metadata-detected classification provides authoritative and common definitions. It is more flexible and less precise than the data-detected classification.

Data classifications for imported objects

An imported object can have:
  • one definition or data-detected classification,
  • multiple metadata-detected classifications (relationships with business terms),
  • multiple proposed, approved and assigned data classifications (relationship with data classes).
For example, you can classify several imported objects that have different data types and patterns with the same business term.

It is recommended to be as precise as possible with the data classifications and have one approved or assigned data classification for an imported object.

Semantic flow lineage

Talend Cloud Data Catalog uses data and metadata-detected classifications to implement lookups of the inferred definition and related elements for the semantic flow lineage.