Executing the Job to label suspect pairs with assigned labels - 7.0

Matching with machine learning

author
Talend Documentation Team
EnrichVersion
7.0
EnrichProdName
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Design and Development > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
EnrichPlatform
Talend Data Stewardship
Talend Studio

Procedure

Press F6 to execute the Job.

Results

tMatchPredict labels the suspect pairs, groups the suspect records which match the YES label and writes all the suspect pairs in the output file.

The suspect records which match the YES label belong to groups because tMatchPredict was configured to groups records which match this clustering class.

The records labeled with the NO label do not belong to any group.

What to do next

You can now create a single representation of each duplicates group and merge these representations with the unique rows computed by tMatchPairing.

For an example of how to create a clean and deduplicated dataset, see Creating a clean data set from the suspect pairs labeled by tMatchPredict and the unique rows computed by tMatchPairing.