Skip to main content Skip to complementary content

Data Class Discovery

Talend Data Catalog has a concept of data classes. These data classes may be applied like tags to column level (e.g., columns in a database or fields in a file) objects and indicate that object to be a class of object, e.g., Social Security Number or Gender. In this way, one may categorize by data class and thus identify, sort, operate on different objects all of that same type.

You may manually assign data classes to a object from the element’s object page or when browsing in grid mode. In addition, as part of the harvesting and data profiling process, Talend Data Catalog will suggest data class assignments that may be confirmed and made permanent.

Information note

Data classes have been referred to as semantic types in the past. Currently, though, with the inclusion of metadata-detected data classes and other improvements, the concept has been generalized into data class and all data classification is based upon these.

Steps

  1. Ensure that you have specified the appropriate data sampling and profiling options before harvesting.
  2. Navigate to the object page for the object you wish to work with.
Information note

You may also review and editing data class assignments in grid mode. However, they cannot be assigned in bulk.

  1. Talend Data Catalog will have proposed data classes.
  2. To confirm a proposed data class, click the check.
  3. To reject a data class, click the X.
Information note

Reject a data class proposal is permanent, and in future harvests it will not be suggested again. You may, however, assign it manually in the future.

  1. To specify a data class that is not currently assigned, click in the box and start typing. A pull-down list with options of valid data classes will be provided to pick from.

Example

Navigate to the object page of the Gender field in the Employee.csv file.

There are two suggested data classes. Confirm the Gender type by clicking the check mark next to that type. Then reject the Civility type by clicking the X next to it.

Information note

You will receive a warning that this action is permanent.

Are you sure you want to reject Civility data class?

It will not be proposed again for this object if you reject it but you will still be able to manually add it.

And the result is a single confirmed type.

Explore Further

Valid Data Classes

The available set of data classes is strictly controlled, thus you may not simply type a new one in when assigning them to a object. A data class definition is more than just a name. In includes rules to match against (textual pattern matching rules or a list of valid values).

The current set of valid data classes may be reviewed, edited and removed using the manage data classes feature.

Hiding profiling and sample data by data class

You may ensure that that sample and profile data are hidden from the casual user by setting a Hide flag on that object.

In addition to manually setting this value, you may also define a data class to hide the data sampled and profiled on subsequent harvests. Thus, e.g., you could define the data class US Social Security Number to be hidden for all objects of that data class. Then, as the data is profiled in subsequent harvests, and Talend Data Catalog determines that an element is of that data class, its flag will be set to hidden. Go to manage data classes to manage this feature.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!