Executing the Job to label suspect pairs with assigned labels - 6.5

Matching with machine learning

EnrichVersion
6.5
EnrichProdName
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
EnrichPlatform
Talend Studio
task
Data Governance > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Design and Development > Third-party systems > Data Quality components > Matching components > Matching with machine learning components

Procedure

Press F6 to execute the Job.

Results

tMatchPredict labels the suspect pairs, groups the suspect records which match the YES label and writes all the suspect pairs in the output file.

The suspect records which match the YES label belong to groups because tMatchPredict was configured to groups records which match this clustering class.

The records labeled with the NO label do not belong to any group.

What to do next

You can now create a single representation of each duplicates group and merge these representations with the unique rows computed by tMatchPairing.

For an example of how to create a clean and deduplicated dataset, see Creating a clean data set from the suspect pairs labeled by tMatchPredict and the unique rows computed by tMatchPairing.