tDataMasking properties - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Component family

Data Quality

 

Basic settings

Schema and Edit Schema

A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. The schema is either Built-In or stored remotely in the Repository.

Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.

Click Sync columns to retrieve the schema from the previous component connected in the Job.

Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this option to view the schema only.

  • Change to built-in property: choose this option to change the schema to Built-in for local changes.

  • Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the [Repository Content] window.

The output schema of this component contains one read-only column, ORIGINAL_MARK. This column identifies by true or false if the record is an original record or a substitute record respectively.

 

 

Built-In: You create and store the schema locally for this component only. Related topic: see Talend Studio User Guide.

 

 

Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs. Related topic: see Talend Studio User Guide.

 

Modification

Define in the table what fields to change and how to change them:

Input Column: Select the column from the input flow for which you want to generate similar data by modifying its values.

These modifications are based on the function you select in the Function column and the number of modifications you set in the Max Modification Count column.

Function: Select the function that will decide what modification to do in order to generate similar substitutional data. For example, you can decide to have similar values through replacing or adding letters or numbers, replacing values with synonyms from an index file or deleting values by setting the function to null.

The Function list will vary according to the column type. For further information about function behavior, see Function behavior in common PII.

For example, a column of a Long type will have a Numeric variance option in the list while a column of a String type will not have such function. Also, the Function list for a Date column is date-specific, it allows you to decide the type of modification you want to do on date values.

-Parameter: This field is used by some of the functions, it will be disabled when not applicable. When applicable, enter a number or a letter to decide the behavior of the function you have selected.

Advanced settings

Seed for random generator

Set a random number if you want to generate the same sample of substitute data in each execution of the Job. This field is set to 12345678 by default.

Repeating the execution with a different value for this field will result in a different sample being generated. Keep this field empty if you want to generate a different sample each time you execute the Job.

 

Output the original row

Select this check box to output original data rows in addition to the substitute data. Having both data rows can be useful in debug or test processes.

 

Should null input returns null

This check box is selected by default. When selected, the component outputs null when input values are null. Otherwise, it returns the default value when the input is null, that is an empty string for string values, 0 for numeric values and the current date for date values.

This parameter does not have an effect on the Generate Sequence function. If the input is null, this function will not return null, even if the box is checked.

 

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the Job level as well as at each component level.

Usage

This component is an intermediary step. It requires an input and output flows.

Limitation

n/a