PII Data classes

Talend Data Catalog is delivered with a fairly robust set of data classes already defined. Many of these may be considered PII. First thing we will do is identify which ones should be considered PII.

Create a Compound PII Data class

Compound data classes may be defined as the union of other data-detected and/or metadata-detected data classes. In this case, PII will be a compound data class that consists of several data and metadata-detected data classes which are categorized as personally identifiable information and their data should be hidden by default.

Go to MANAGE > Data classes.

Looking at the list we have at least five different types which could be PII:

Address Line
Gender
Last Name
US Postal Code
US Social Security Number

There may be others, but we will start with these for this exercise.

One of the powerful features of data classes is the ability to enable auto-tagging of elements and auto hiding at the same time. In order to take advantage of these features, we will create a new PII data class which is a compound type of the above PII related data classes we listed.

Click +Add and enter the following:

The data class is a Compound data class consisting of the five types in the earlier list.

Specify the Confidential as the DEFAULT SENSISTIVITY and associate several PII type data classes in the COMPOUND TYPES selection box.

Add all the data classes identified above.

Click SAVE.

The Hide Data setting is assigned for this sensitivity label (e.g., Classified). This way, when a data element is tagged with the PII data class, its Sensitivity Label with be Classified and thus its data will also be hidden from casual users who do not have the Data Managementcapability object role assignment.

Harvest with the Data class

Go to MANAGE > Configuration, select the Data Lake model and go to the Import Options tab. Note, this model is defined for Data Profiling and Sampling.

Click Import and be sure to check FULL SOURCE IMPORT INSTEAD OF INCREMENTAL to ensure that the cached copy is not simply reused.

Once the import has completed and the data profiled and samples (check the Logs tab), We will see what was profiling and auto tagged (and thus auto hidden).

Analyze the Auto Tagging and Hiding Results

Go to WORKSHEETS > File > Fields.

Add the Data Classifications column to the Grid view and Filter onData Classifications =PII:

Here is a list of all the auto tagged PII fields.

Go to the object page for (click) SSN.

This field was tagged as both US Social Security Number and PII, as PII is a compound of several types including the other.

If we sign in as a casual user or even a user with the Data Viewercapability object role assignment (e.g., Dan), we cannot see the profiling information:

Demonstrating the auto hiding feature.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!

Leave your feedback here