Skip to main content Skip to complementary content
Close announcements banner

Data Masking effects

Depending on the semantic type of the column on which you use the Mask data (obfuscation) function, the available parameters and their effects will vary.

Text and semantic types

For textual data, Talend Data Preparation automatically suggests either one of the predefined semantic types, one of your custom semantic types, or the Text type. In the case of the predefined and custom semantic types, they can be based either on a regular expression, or a dictionary of values.

The following table lists the available masking routines for a column with the Text type, or any of the predefined or custom semantic types, and their effects on the value Talend in 2018 is awesome for example.

Masking routine Description Parameters Output
Semantic masking
  • For regular expression-based semantic types, the function will generate random records that correspond to the regular expression pattern.
    Information noteNote: Semantic types built with regular expressions that are not compatible with the dk.brics.automaton library do not support semantic masking, and every character of the record is randomly replaced.
  • For dictionary-based semantic types, the function will randomly replace the records with values extracted from the dictionary used to create the semantic type in the first place.
Masking mode: Random or Repeatable Äåòçôî ëð 1889 òn äipïåvu
Keep characters between two positions All the characters included in the selected interval remain as is, while the ones outside the interval are deleted.
  • Beginning index: 11
  • End index: 25
2018 is awesome
Generate from Char Pattern A records with random characters will be created from the pattern of your choice.
  • Character pattern: aaaaaa 9999 aaaaaaa
  • Masking mode: Random or Repeatable
õaßayè 8908 æluäco
Remove characters between two positions All the characters included in the selected interval are removed, while the ones outside the interval remain as is.
  • Beginning index: 7
  • End index: 14
Talend is awesome
Replace all All the characters are replaced with the substitute of your choice.
  • Replacement: x
  • Masking mode: Random or Repeatable
xxxxxxxxxxxxxxxxxxxxxxxxx
Replace all digits All the digits are replaced with the substitute of your choice. Letters are kept as is.
  • Replacement: 9
  • Masking mode: Random or Repeatable
Talend in 9999 is awesome
Replace all letters All the letters are replaced with the substitute of your choice. Digits are kept as is.
  • Replacement: y
  • Masking mode: Random or Repeatable
yyyyyy yy 2018 yy yyyyyyy
Replace characters between two positions All the characters included in the selected interval are replaced, while the ones outside the interval remain as is.
  • Beginning index: 1
  • End index: 6
  • Replacement: a
  • Masking mode: Random or Repeatable
aaaaaa in 2018 is awesome
Replace first n characters Replaces the first n characters with the substitute of your choice, while the following ones remain as is.
  • Number of characters: 17
  • Replacement: @
  • Masking mode: Random or Repeatable
@@@@@@@@@@@@@@@@@ awesome
Replace last n characters Replaces the last n characters with the substitute of your choice, while the previous ones remain as is.
  • Number of characters: 10
  • Replacement: !
  • Masking mode: Random or Repeatable
Talend in 2018 !!!!!!!!!!
Keep first n digits and replace following ones Keep the first n digits as is and replaces subsequent ones with random digits. Non-digits characters remain as is.
  • Number of digits: 1
  • Masking mode: Random or Repeatable
Talend in 2436 is awesome
Keep last n digits and replace previous ones Keep the last n digits as is and replaces previous ones with random digits. Non-digits characters remain as is.
  • Number of digits: 2
  • Masking mode: Random or Repeatable
Talend in 1618 is awesome

Numeric values

The following table lists the available masking routines for a column containing numeric values, with the Integer or Decimal type, and their effect on the value 21803 for example.

Masking routine Parameters Output
Replace with random value
  • Maximum variation (%): 10
  • Masking mode: Random or Repeatable
21499
Generate value between two values
  • Minimum value: 20000
  • Maximum value: 22000
  • Masking mode: Random or Repeatable
21876

Dates

The following table lists the available masking routines for a column with the Date semantic type, and their effects on the value 05/04/2018 for example.

Masking routine Parameters Output
Replace with random date
  • Maximum variation (in days): 365
  • Masking mode: Random or Repeatable
23/11/2017
Keep year and set day and month to 01/01 N/A 01/01/2018

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!