tDataMasking Standard properties - 7.0

Data privacy

author
Talend Documentation Team
EnrichVersion
7.0
EnrichProdName
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Data Quality components > Data privacy components
Data Quality and Preparation > Third-party systems > Data Quality components > Data privacy components
Design and Development > Third-party systems > Data Quality components > Data privacy components
EnrichPlatform
Talend Studio

These properties are used to configure tDataMasking running in the Standard Job framework.

The Standard tDataMasking component belongs to the Data Quality family.

The component in this framework is available in Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and in Talend Data Fabric.

Basic settings

Schema and Edit Schema

A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. When you create a Spark Job, avoid the reserved word line when naming the fields.

Click Sync columns to retrieve the schema from the previous component connected in the Job.

Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this option to view the schema only.

  • Change to built-in property: choose this option to change the schema to Built-in for local changes.

  • Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the [Repository Content] window.

The output schema of this component contains one read-only column, ORIGINAL_MARK. This column identifies by true or false if the record is an original record or a substitute record respectively.

 

Built-In: You create and store the schema locally for this component only.

 

Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs.

Modifications

Define in the table what fields to change and how to change them:

Input Column: Select the column from the input flow for which you want to generate similar data by modifying its values.

These modifications are based on the function you select in the Function column and the number of modifications you set in the Max Modification Count column.

Function: Select the function that will decide what modification to do in order to generate similar substitutional data. For example, you can decide to have similar values through replacing or adding letters or numbers, replacing values with synonyms from an index file or deleting values by setting the function to null.

The Function list will vary according to the column type. For further information about function behavior, see Function behavior in common PII.

For example, a column of a Long type will have a Numeric variance option in the list while a column of a String type will not have such function. Also, the Function list for a Date column is date-specific, it allows you to decide the type of modification you want to do on date values.

Extra Parameter: This field is used by some of the functions, it will be disabled when not applicable. When applicable, enter a number or a letter to decide the behavior of the function you have selected.

Keep format: this function is only used on Strings. Select this check box to keep the input format when using the Generate unique SSN number, Generate account number and keep original country and Generate credit card number and keep original bank functions. That is to say, if there are spaces, dots ('.'), hyphens ('-') or slashes ('/') in the input, the output will have the same characters.

Advanced settings

Seed for random generator

Set a random number if you want to generate the same sample of substitute data in each execution of the Job. This field is set to 12345678 by default.

Repeating the execution with a different value for this field will result in a different sample being generated. Keep this field empty if you want to generate a different sample each time you execute the Job.

Output the original row

Select this check box to output original data rows in addition to the substitute data. Having both data rows can be useful in debug or test processes.

Should null input return null

This check box is selected by default. When selected, the component outputs null when input values are null. Otherwise, it returns the default value when the input is null, that is an empty string for string values, 0 for numeric values and the current date for date values.

This parameter does not have an effect on the Generate Sequence function. If the input is null, this function will not return null, even if the box is checked.

Should empty input return empty

When this check box is selected, the component returns the input values if they are empty. Otherwise, the selected functions are applied to the input data.

tStat Catcher Statistics

Select this check box to gather the Job processing metadata at the Job level as well as at each component level.

Usage

Usage rule

This component is an intermediary step. It requires an input and output flows.