Executing the Job - 7.3

Data privacy

Version
7.3
Language
English
Product
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Data Quality components > Data privacy components
Data Quality and Preparation > Third-party systems > Data Quality components > Data privacy components
Design and Development > Third-party systems > Data Quality components > Data privacy components
Last publication date
2024-03-28

Procedure

  1. Save your Job and press F6 to execute it.
    Duplicate data is generated and written to the output file.
  2. Right-click the output component and select Data Viewer to display the duplicate data.
    Duplicate records have been marked as false in the ORIGINAL_MARK column.
    Some data has been modified in the Name, City and DOB fields according to the criteria you set in the Modifications table and duplicate records have been generated based on these modifications.
    For example, if you compare the original name Mrs Morgan Ross and the duplicate name Mrs M rganosRiss, you will see that the two functions have been used on this duplicate record: the letter o has been exchanged with a space, and also the sound has been replaced in Ross and Riss. However, the soundex code has not been changed for the replaced sound.
  3. In the tDuplicateRow basic settings and in the Distribution of duplicates area, select a different distribution, Bernoulli distribution for example, and run the Job.
    Different duplicates are generated from the same input flow according to the selected distribution as shown in the below figure.