Procedure
-
Double-click tMatchModel to display the
Basic settings view and define the component
properties.
-
In the Matching Key table, click the
[+] button to add rows in the table and select the
columns on which you want to base the match computation.
The Original_Id column is ignored in the computation of the matching model.
- From the matching label column list, select the column which holds the labels you added on the suspect records.
- Select the Save the model on file system check box and in the Folder field, set the path to the local folder where you want to generate the matching model file.
-
Click Advanced settings and set the below
parameters:
- Set the maximum number of the tokens to be used in the phonetic comparison in the corresponding field.
-
In the Random Forest hyper parameters tuning,
enter the ranges for the decision trees you want to build and their
depth.
These parameters are important for the accuracy of the model.
- Leave the other by-default parameters unchanged.
- Press F6 to execute the Job and generate the matching model in the output folder.
Results
You can now use this model with the tMatchPredict component to label all the duplicates computed by tMatchPairing.
For further information, see Labeling suspect pairs with assigned labels.