Hides original data with random characters or figures to protect the actual data while having a functional substitute for occasions when it is not advisable to show sensitive real data.
tDataMasking reads a data set row by row and creates a structurally similar but inauthentic version of the data after having applied specific functions on data fields. It generates one row for each input row.
You will be able to use the functional substitute for purposes such as testing and training. When manipulating Personally Identifiable Information (PII) or Sensitive Personal Data (SPD), you might want to protect and mask this data.
The definition of sensitive data is broad and may differ from one country to the other or from one organization to the other. Basically, sensitive data can be personal information or business information which includes anything that poses a risk to the person or company in question.
Globally, Credit/Debit card data for example is considered to be sensitive. Sensitive data is any piece of information that can be used to identify or locate a person. A non-exhaustive list of personal sensitive data may include: first and last names, email addresses, addresses, Social Social Number (SSN), credit card numbers, bank account numbers, race, gender, date of birth, salary and geolocation combined with time.
For further information about personal sensitive data, see Personally Identifiable Information.
Also, business sensitive data may include trade secrets, acquisition plans, financial data and customer information, among other possibilities.
In local mode, Apache Spark 1.6.0 and later versions are supported.
For more technologies supported by Talend, see Talend components.
Depending on the Talend product you are using, this component can be used in one, some or all of the following Job frameworks:
Standard: see tDataMasking Standard properties.
The component in this framework is available in Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and in Talend Data Fabric.
Spark Batch: see tDataMasking properties for Apache Spark Batch.
Spark Streaming: see tDataMasking properties for Apache Spark Streaming.