Define the match analysis - 7.0

Data privacy

author
Talend Documentation Team
EnrichVersion
7.0
EnrichProdName
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Data Quality components > Data privacy components
Data Quality and Preparation > Third-party systems > Data Quality components > Data privacy components
Design and Development > Third-party systems > Data Quality components > Data privacy components
EnrichPlatform
Talend Studio

Procedure

  1. From the Profiling perspective, right-click Metadata and create a file connection to the duplicated_records output file generated by the Job.
    For further information, check the Data Profiling part in the Talend Studio User Guide.
  2. Expand the new file connection under Metadata and select Analyze matches.
  3. Follow the steps in the wizard to define the analysis metadata and click Finish to open the analysis editor.
  4. In the Matching Key table, define a match key on the Code column to group records by their identification, records which have the same code are grouped together.
  5. Click Chart below the table to show the duplicates generated according to the Bernoulli distribution selected previously in the Job.