Generate from file/list - 7.3

Data privacy

English (United States)
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
Talend Studio
Data Governance > Third-party systems > Data Quality components > Data privacy components
Data Quality and Preparation > Third-party systems > Data Quality components > Data privacy components
Design and Development > Third-party systems > Data Quality components > Data privacy components

This function randomly replaces the input value with one of the user-defined values.

This function is applied to Strings or numerical data types.

Option Description
Method The Randomly method randomly selects the value from the list (or file). As a result, two similar input values can be masked with the different output values.

The Consistently method ensures that two similar input values are masked with the same output value.

When using the Consistently method, the probability of generating duplicates can be calculated using the following formulas:
  • P = 1 if K < N, or
  • P = 1-K*(K-1)*(K-2)*…*(K-N+1) / K^N

where P is the probability of generating duplicates, N the input data size and K is the size of the input list given as a parameter.

Using this approach, it is possible to calculate the probability to find a pair sharing the same value within a group.

For example, the probability that, in a group of n people, two people have the same birthday is the following:
  • 2.7% in a group of 5 people,
  • 41.1% in a group of 20 people,
  • 100% in a group of 367 people, since there are 366 possible birthdays, including February 29.
Extra parameter This function requires an extra parameter.
The extra parameter can be:
  • a comma-separated list of two values minimum; or
  • a path to a file containing the values.

The values must be stored in a String and separated by commas, for example: "item1, item2, item3, etc.". This function uses the hashCode() method provided by Java to choose an element from the list.

If you use the Apache Spark Batch or the Apache Spark Streaming version of the component, enter the prefix before the file path:
  • prefix://file path, even if you run the Job in local mode, or
  • hdfs://hdpnameservice1/file path if the index is on a cluster.

Paths to folders are not supported.

If the extra parameter is not set, the function returns an empty String or 0.

In the following example, the masked value is one of the values set as extra parameters.

Input value Method Extra parameter Examples of a masked value
21 Randomly "help,documentation" help