Data classification helps you to detect, understand and classify the nature and purpose of the elements contained in the data sources imported in your catalog.
You can classify imported objects with glossary terms to define these technical elements in business terms that everyone can understand. Data classification can also help you to find hidden relationships between these objects.
Talend Data Catalog helps you to automate the identification and data classification process using the data profiling capability and the data classes. It allows you to protect sensitive data automatically.
You can see and manage the existing data classes and create new ones from.
Types of data classes
- Data-detected classes detect the nature of data automatically based on predefined enumeration, patterns and regular expressions. The data-detected classification uses the data sampling and profiling capability.
- Metadata-detected classes detect classes by metadata attributes. They help you to detect data that cannot be identified with the data-detected classification, such as date of birth which do not have unique data patterns. The metadata-detected classification is powered by the MQL capability.
- Compound classes are based on multiple metadata-detected and data-detected classes.
You can use these data classes to profile and match the criteria by which sensitive data are hidden. Data and metadata-detected classes share the same infrastructure for PII and data hiding.
Data-detected and Metadata-detected classifications
The data-detected classification detects common data patterns automatically. It is less focused on providing definitions.
The metadata-detected classification provides authoritative and common definitions. It is more flexible and less precise than the data-detected classification.
Data classifications for imported objects
- one definition or data-detected classification,
- multiple metadata-detected classifications (relationships with business terms),
- multiple proposed, approved and assigned data classifications (relationship with data classes).
It is recommended to be as precise as possible with the data classifications and have one approved or assigned data classification for an imported object.
Semantic flow lineage
Talend Data Catalog uses data and metadata-detected classifications to implement lookups of the inferred definition and related elements for the semantic flow lineage.