Executing the Job

Deduplication

author
Talend Documentation Team
EnrichVersion
6.5
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Data Quality components > Deduplication components
Data Quality and Preparation > Third-party systems > Data Quality components > Deduplication components
Design and Development > Third-party systems > Data Quality components > Deduplication components
EnrichPlatform
Talend Studio

The tLogRow component is used to present the execution result of the Job. You can configure the presentation mode on its Component view.

To do this, double-click tLogRow to open the Component view and in the Mode area, select the Table (print values in cells of a table) check box.

To execute this Job, press F6.

Once done, the Run view is opened automatically, where you can check the execution result.

You can read that the last row is the survivor record because its SURVIVOR column indicates true. This record is composed of the best-of-breed data of each column from the four other rows which are the duplicates of the same group.

The CONFLICT column presents the columns carrying more than one record field values compliant with the given validation rules. Take the credibility column for example: apart from the survivor record whose credibility is 5.0, the CONFLICT column indicates that the credibility of the second record GRIZZARD is 4.0, also bigger than 3, the threshold set in the rules you have defined, however, as the credibility 5.0 appears in the first record GRIZZARD CO., tRuleSurvivorship selects it as best-of-breed data.