Skip to main content Skip to complementary content

Data-detected Data Classes

A data class is defined to identify a data pattern. You can define the data pattern manually or ask the application to learn it automatically from the data and approval actions.

A field or column may be assigned one or more Data-detected and/or Metadata-detected Data Classes . Once that class type is assigned, it is a property of the element and may be searched on, filtered by and further edited (remove the assignment).

Information note

E.g., the field in the example above is assigned the data class Gender .

Data classes are based upon a pool, defined repository-wide. This pool includes a unique name for the class and either:

  • Enumeration - list of valid values, e.g., Red, Blue, Green.
  • Pattern - list of possible patterns, generally discovered by the software, e.g., A{2}9{3}-9{3}-9{3}
  • Regular Expression - syntactical rule set.

These are used to infer data classification for objects based upon sampling and profiling the data .

Some actions can apply to all objects of a certain data class. In particular the Hide/Show property .

Edit a Data-detected Data Class

Steps

  1. Manage data classes .
  2. If the data class does not yet exist, add the data class .
  3. You may edit all the properties in common for a data class.
Information note

You may not edit the Type after it has been set. You must create a new data class instead.

  1. Set the MATCHING THREASHOLD to specify the minimum percentage of values matching any of the enumeration values, patterns or regular expression among all values (of that field/column).
  2. Set the UNIQUENESS THREASHOLD to specify he minimum number of unique values among all values (of that field/column) to require enough diversity of the data set.
Information note

By default, the UNIQUENESS THREASHOLD is set to 1 on enumerations (and limited to the maximum number of enumeration values) and set to 6 otherwise.

  1. Enter the DATA PATTER N, which may be one of the following:
    • Enumeration : a list of values for the data to match.
    • Pattern : Patterns for the data to match.
    • Regular Expression : RegEx format expression for the data to match.
  2. Click SAVE .

Usage

To understand these settings, an all-women’s college student database can have 1000s of rows that all have Female in the Gender column. In this case, the UNIQUENESS THREASHOLD should be set to 1 to match the Gender data class.

The International Gender enumeration data class has Male and Female values in different languages. When the customer has a column that uses Male and Female values in one language the application will match it with confidence less than 100% because of other languages. It is recommended that you use “International” data classes with care and employ them only when you have truly multilingual columns. Otherwise, you should define a data class for each language used and group them in an “International” compound data class. For example:

  • English Gender (enumeration): Male, Female
  • French Gender (enumeration): Mâle, Femelle
  • International Gender (compound): English Gender, French Gender

When the matching rule is Enumeration and the number of its possible values is less than the one specified in the UNIQUENESS THREASHOLD the application uses the number of possible values as the UNIQUENESS THREASHOLD .

Example

Sign in as Administrator and go to MANAGE > Data Classes .

Enter “Product” in the Search box.

Click the line for the Product Number RegEx class

Click the Regular Expression radio button and enter “^\D{2}-\d{4}$” as the first line in the DATA PATTERN box. Select “20”in the MATCHING THREASHOLD (%) box. Click SAVE .

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!