Data Masking effects - Cloud

Talend Cloud Data Preparation User Guide

author
Talend Documentation Team
EnrichVersion
Cloud
EnrichProdName
Talend Cloud
task
Data Quality and Preparation > Cleansing data
EnrichPlatform
Talend Data Preparation
Depending on the semantic type of the column on which you use the Mask data (obfuscation) function, the available parameters and their effects will vary.

Text and semantic types

For textual data, Talend Data Preparation automatically suggests either one of the predefined semantic types, one of your custom semantic types, or the Text type. In the case of the predefined and custom semantic types, they can be based either on a regular expression, or a dictionary of values.

The following table lists the available masking routines for a column with the Text type, or any of the predefined or custom semantic types, and their effects on the value Talend in 2018 is awesome for example.

Masking routine Description Parameters Output
Semantic masking For a Text semantic type, the function will generate random characters, respecting the pattern of the original record. Masking mode: Random or Repeatable Äåòçôî ëð 1889 òn äipïåvu
For regular expression-based semantic types, the function will generate random records that correspond to the regular expression pattern.
Note: Semantic types built with regular expressions that are not compatible with the dk.brics.automaton library do not support semantic masking, and every character of the record is randomly replaced.
For dictionary-based semantic types, the function will randomly replace the records with values extracted from the dictionary used to create the semantic type in the first place.
Keep characters between two positions All the characters included in the selected interval remain as is, while the ones outside the interval are deleted. Beginning index: 11 2018 is awesome
End index: 25
Generate from Char Pattern A records with random characters will be created from the pattern of your choice. Character pattern: aaaaaa 9999 aaaaaaa õaßayè 8908 æluäco
Masking mode: Random or Repeatable
Remove characters between two positions All the characters included in the selected interval are removed, while the ones outside the interval remain as is. Beginning index: 7 Talend is awesome
End index: 14
Replace all All the characters are replaced with the substitute of your choice. Replacement: x xxxxxxxxxxxxxxxxxxxxxxxxx
Masking mode: Random or Repeatable
Replace all digits All the digits are replaced with the substitute of your choice. Letters are kept as is. Replacement: 9 Talend in 9999 is awesome
Masking mode: Random or Repeatable
Replace all letters All the letters are replaced with the substitute of your choice. Digits are kept as is. Replacement: y yyyyyy yy 2018 yy yyyyyyy
Masking mode: Random or Repeatable
Replace characters between two positions All the characters included in the selected interval are replaced, while the ones outside the interval remain as is. Beginning index: 1 aaaaaa in 2018 is awesome
End index: 6
Replacement: a
Masking mode: Random or Repeatable
Replace n first characters Replaces the n first characters with the substitute of your choice, while the following ones remain as is. Number of characters: 17 @@@@@@@@@@@@@@@@@ awesome
Replacement: @
Masking mode: Random or Repeatable
Replace n last characters Replaces the n last characters with the substitute of your choice, while the previous ones remain as is. Number of characters: 10 Talend in 2018 !!!!!!!!!!
Replacement: !
Masking mode: Random or Repeatable
Keep n first digits and replace following ones Keep the first n digits as is and replaces subsequent ones with random digits. Non-digits characters remain as is. Number of digits: 1 Talend in 2436 is awesome
Masking mode: Random or Repeatable
Keep n last digits and replace previous ones Keep the last n digits as is and replaces previous ones with random digits. Non-digits characters remain as is. Number of digits: 2 Talend in 1618 is awesome
Masking mode: Random or Repeatable

Numeric values

The following table lists the available masking routines for a column containing numeric values, with the Integer or Decimal type, and their effect on the value 21803 for example.

Masking routine Parameters Output
Replace with random value Maximum variation (%): 10 21499
Masking mode: Random or Repeatable
Generate value between two values Minimum value: 20000 21876
Maximum value: 22000
Masking mode: Random or Repeatable

Dates

The following table lists the available masking routines for a column with the Date semantic type, and their effects on the value 05/04/2018 for example.

Masking routine Parameters Output
Replace with random date Maximum variation (in days): 365 23/11/2017
Masking mode: Random or Repeatable
Keep year and set day and month to 01/01   01/01/2018