Hides original data with random characters or figures to protect the actual data while having a functional substitute for occasions when it is not advisable to show sensitive real data.
tDataMasking reads a data set row by row and creates a structurally similar but inauthentic version of the data after having applied specific functions on data fields. It generates one row for each input row.
You will be able to use the functional substitute for purposes such as testing and training. When manipulating Personally Identifiable Information (PII) or Sensitive Personal Data (SPD), you might want to protect and mask this data.
The definition of sensitive data is broad and may differ from one country to the other or from one organization to the other. Basically, sensitive data can be personal information or business information which includes anything that poses a risk to the person or company in question.
Globally, Credit/Debit card data for example is considered to be sensitive. Sensitive data is any piece of information that can be used to identify or locate a person. A non-exhaustive list of personal sensitive data may include: first and last names, email addresses, addresses, Social Social Number (SSN), credit card numbers, bank account numbers, race, gender, date of birth, salary and geolocation combined with time.
For further information about personal sensitive data, see Personally Identifiable Information.
Also, business sensitive data may include trade secrets, acquisition plans, financial data and customer information, among other possibilities.
In local mode, Apache Spark 2.4.0 and later versions are supported.
This component is not shipped with your Talend Studio by default. You need to install it using the Feature Manager. For more information, see Installing features using the Feature Manager.
For more technologies supported by Talend, see Talend components.
Depending on the Talend product you are using, this component can be used in one, some or all of the following Job frameworks:
Standard: see tDataMasking Standard properties.
The component in this framework is available in Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, and in Talend Data Fabric.
Spark Batch: see tDataMasking properties for Apache Spark Batch.
The component in this framework is available in all Talend Platform products with Big Data and in Talend Data Fabric.
Spark Streaming: see tDataMasking properties for Apache Spark Streaming.
This component is available in Talend Real Time Big Data Platform and Talend Data Fabric.