There are several functions in the tDataMasking component which vary according to the type of the data column.
It is advisable to use the functions predefined in the component with columns that hold personal information, such as first and last names, email addresses, addresses, SSN, credit card numbers, bank account numbers, race, gender, date of birth and salary.
Functions that are not self-explanatory are explained in the below table:
Function |
Description |
---|---|
Set to null |
This function returns |
Date Variance |
This function applies only on Date values. It uses a parameter which must be a number, this parameter represents a number of days. It will then modify the input date by adding or retrieving a number of days lower than the parameter. For example, if the input date is
15-02-1992 and the parameter is
10, then the generated date is randomly selected
between If the input date is null, then the function returns the current date. If the given parameter is 0 or
null or if it is not a number, then the parameter
is replaced by 31. For example, if the input date is
05-11-2016, then the generated date is randomly
selected between |
Keep year and set day and month to 01/01 |
This function applies only on Date values. It requires no parameter. It sets the month and day of the input date to January, 1 but does not change the year. For example, if the input date is
15-02-1992, the function returns
|
Generate Account Number |
This function generates a valid French bank account number. It requires no parameter and only applies on String values. A French IBAN number is a 27-character code. The numbers are randomly generated but against algorithms. The last digit of the IBAN is known as the "clef RIB" and is generated with an algorithm and the third and fourth digits of the IBAN are also generated through an algorithm. |
Generate Account Number and keep original country |
This function works like Generate Account Number, it generates a valid bank account number for the original country. If the input is a correct IBAN number, the function generates an IBAN number from the same country as the input taking into account the IBAN number which is different from one country to the other. If the input is a correct American account number the function keeps the first nine digits and randomly replaces the other. |
Generate Credit Card |
This function generates a valid credit card number. It requires no parameter and can be applied on String or Long values. There are three types of credit card that can be generated: Visa, MasterCard or American Express. One of these types is randomly chosen and a credit card number is generated. The number generated is randomly generated and pass algorithms that detect false credit card number. |
Generate credit card and keep original bank |
This function works like Generate Credit Card, it generates a valid credit card number for the original bank. If the input is a correct Visa, MasterCard or American Express credit card number, the function generates a credit card number from the same company and keep the prefix. Otherwise, the function has the same behavior as Generate Credit Card. |
Generate from Pattern |
This function is applied only on Strings and it requires a parameter. It generates a value that matches the pattern given as parameter. The pattern must follow the below rules: - the - the - the - all other characters are kept as they are. You can generate several strings with the same argument
(value) by using For example, if the given pattern is This function does not work correctly if a comma ',' is used in the pattern. |
Generate Phone Number |
This function is applied only on Strings and requires no parameter. It generates a random phone number from different countries (France, Germany, Japan, UK and US). |
Generate Social Security Number (SSN) |
This function is only used on Strings and requires no parameter. It generates a valid random SSN for different countries (China, France, Germany, India, Japan, UK and US). The function returns a valid random SSN number, regardless of the input value. |
Generate unique SSN |
This function is only used on Strings and requires no parameter. It generates a valid unique random SSN related to the input for different countries (China, France, Germany, India, Japan, UK and US). That is to say, if there are duplicates in the input data, you will get the same duplicates in the generated SSNs. In the same way, if there are no duplicates in the input data, there will be no duplicates in the generated SSNs. If the input value is null or is not a valid SSN, the function returns null. |
Generate Sequence
Note:
This function is not supported in the Spark version of the component. |
This function can be applied on everything that is not a date
(Integer, Long, Strings and so on). It requires a parameter that must be a
number. This function returns the parameter, and, for each row, will
increase this number by |
Generate Uuid |
This function is only applied on Strings and requires no parameter. It replaces the input value by a randomly generated UUID. This function uses the |
Generate value between two values |
This function generates a value randomly chosen between two values you give as argument. The argument must be a string holding the bounds, separated by comas, that is min and max. This function can be applied to any types of fields. However, if the field is a date the bounds must also be dates and they must have the same format as in the schema, dd-MM-yyyy for example. Otherwise, the bounds must be integers. If the input is of Date type, the function returns the
current date if the parameter is not in the right format. Otherwise, it
returns an empty string for string values and |
Keep characters between two positions |
This function can be used on Strings and requires two parameters separated by commas. The two first parameters represent the places of two elements in the input. The function returns a new String that only contains those elements and what is in between. If the input is null or if the parameter is in a wrong
format, the function returns an empty String. If the lower bound is lower
than 1, it will be set to 1 and if the higher bound is greater than the
length of the string, it is set to this length. The two parameters can be
given in any order. If the argument is 4, 2, it will
be replaced by |
Remove Characters between two positions |
This function has the same behavior as Keep characters between two positions but with a remove statement. |
Replace characters between two positions |
This function has the same behavior as Keep characters between two positions but with a replace statement. When using the Replace characters between two positions, you can enter a third parameter which is the character used for replacing the elements in the input. If you do not enter a third parameter, each character is replaced with a randomly selected character. For example, if the input is Steven
and the argument is 2, 4, X, the result will be
|
Keep n first digits and replace following ones |
This function is used on Strings, Integers and Long values and requires a number as a parameter. If the parameter is n, the function
keeps the first If the parameter is bigger than the input length, no modifications are applied. |
Keep n last digits and replace previous ones |
This function is the counterpart of Keep n first digits and replace following ones. |
Mask Address |
This function can only be used on String values. It replaces
digits by other digits and everything else by Moreover, there is a list of key words that will not be
transformed: You can give a parameter, it can either be a list of key words to be added to the above list (separated by commas) or it can be a path to a file containing the words. |
Mask email full domain by character |
This function can only be used on Strings. It replaces
everything after the @ character by the character you enter as parameter, or
by a series of If you enter as a parameter something illegal like a string,
a list, multiple characters, a digit, etc. the full email domain will be
masked by a series of For example, if the initial email is
example@talend.com and the given pattern is
B, the generated email looks like
|
Mask email full domain with consistent items |
This function can only be used on Strings. It replaces everything after the @ character randomly by one domain of the list given as parameter (can also be a path to a file containing the domains you want to use). If you do not enter a parameter, everything after the @ character is removed. For example, if the initial email is
example@talend.com and the given parameter is
google.com, yahoo.fr, hotmail.com, the function
chooses randomly a domain from the list and outputs
|
Mask email left part of domain by character |
This function can only be used on Strings. It replaces the
part of the domain before the dot by the character you enter as parameter,
or by a series of If you enter as a parameter something illegal like a string,
a list, multiple characters, a digit, etc. the full email domain will be
masked by a series of For example, if the initial email is
example@talend.com and the given pattern is
B, the generated email looks like
|
Mask email left part of domain with consistent items |
This function can only be used on Strings. It replaces the part of the domain before the dot randomly by a domain name of the list given as parameter (can also be a path to a file containing the domain names you want to use). If you do not enter a parameter, the part of the domain before the dot is removed. For example, if the initial email is
example@talend.com and the given parameter is
google, yahoo.co, hotmail, the function chooses
randomly a domain from the list and outputs
|
Mask email local part by character |
This function can only be used on Strings. It replaces
everything before the @ character by the character you enter as parameter,
or by a series of If you enter as a parameter something illegal like a string,
a list, multiple characters, a digit, etc. the local part of the email will
be masked by a series of For example, if the initial email is
example@talend.com and the given pattern is
B, the generated email looks like
|
Mask email local part with consistent items |
This function can only be used on Strings. It replaces everything before the @ character randomly by one value of the list given as parameter (can also be a path to a file containing the words you want to use). If you do not enter a parameter, everything before the @ character is removed. For example, if the initial email is
example@talend.com and the given parameter is
jdoe, jsmith, pnewman, the function chooses
randomly a value from the list and outputs |
Numeric Variance |
This function applies only to numerical types (Integer, Long, Float and Double). It takes a parameter that must be a number, this parameter
represents a percentage of modification. The function modifies the input
data by multiplying it by a number between the parameter and its opposite.
For example, if the input is 100 and the parameter is
10, then the generated value will be a randomly
selected value between |
Replace all |
This function can be used on Strings and requires a character as a parameter. If you do not enter a parameter, each character is replaced with a randomly selected character. If the parameter is X, the function replaces all the characters of the input by X. A null input makes the function returns an empty string. |
Replace all digits |
This function can be used on Strings and requires a character as a parameter. If you do not enter a parameter, each digit is replaced with a randomly selected digit. Anything that is not a digit will not be changed. A null input makes the function returns an empty string. |
Replace all letters |
This function can be used on Strings and requires a character as a parameter. If you do not enter a parameter, all letters replaced with a randomly selected character. Anything that is not a letter will not be changed. A null input makes the function returns an empty string. |
Replace by consistent items from input list (or file) |
This function modifies the input value by randomly selecting
one of the values given as parameter. The values must be stored in a String
and separated by commas, for example ("item1, item2, item3, etc."). It uses
the It is applied to Strings or numerical types and it ensures
that two similar inputs have the same output. It returns an empty String or
For example, you could use this function to generate SSNs. However, this function may generate duplicates even though there are no duplicates in the input data. To prevent this from happening, use Generate Unique SSN. When using Replace by consistent
items from input list (or file), the probability of
generating duplicates can be calculated using the following formulas:
where Using this approach, it is possible to calculate the probability to find a pair sharing the same value within a group. For example, the probability that, in a group of
n people, two people have the same birthday is the
following:
|
Replace by item from input list (or file) |
This function has the same behavior as Replace by consistent item from input list, but it randomly select the value from the list (or file), so outputs will be different. |
Replace n first characters |
If the parameter is n, the function replaces the first n characters of the input and keeps all the characters that follow. A null input makes the function returns an empty string. If the parameter is bigger than the input length, all the characters are replaced. You can enter a second parameter which is the replacement character. For example, if the input is Steven
and the argument is 2, X, the result will be
|
Replace n last characters |
This function is the counterpart of Replace n first characters. |