How to simulate the matching of staging data records - 6.3

Talend Data Fabric Studio User Guide

EnrichVersion
6.3
EnrichProdName
Talend Data Fabric
task
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

In the staging data container browser, Talend Studio allows you to simulate the matching of staging data records retrieved from a specific data container and check the match result. If they do match, you can check the match details. For more information about data containers and how to browse a data container, see Data Containers.

Prerequisite(s): A match rule has been defined and attached to a data model. The match rule and the data model have already been deployed to the MDM server.

Note

The match simulation operations will not take into account the built-in blocking key which you can select to use when defining the match rule.

For more information about how to attach a match rule to a data model, see Attaching a Match Rule to a Data Model.

To simulate a match operation on staging data records, do the following:

  1. In the MDM Repository tree view, expand the Data Container node.

  2. Double-click the data container from which you want to run the match simulation to open the data container editor.

  3. Click the Staging Data Container tab to open the staging data container view.

  4. Click the icon to retrieve data records of all entities.

    You can define criteria to narrow down the data records you want to retrieve. For more information about how to browse a data container, see How to browse a data container.

  5. Select more than one record belonging to the same entity, right-click the selected records at the same time, and then select Simulate Match from the contextual menu.

  6. The [Match Result] dialog box opens, demonstrating the match result of the selected data records.

    If a data record does not match any of other data records, a separate group will be created for the data record.

    The match result includes the following information:

    • GRP_SIZE: Indicates the number of similar staging data records which are grouped together.

    • CONFIDENCE: Indicates the confidence score computed by normalizing all match scores to a value between 0 and 1 with a weighted match score.

    • SCORE: Indicates, in the form of a percentage accurate to two decimal places, the consolidated match score (that is, how similar two or more data records are) computed based on all match keys defined in the match rule attached to the data model entity. You can move your mouse over the score to view the score expressed as a decimal.

    • ATTR_SCORE: Indicates, in the form of a percentage accurate to two decimal places, the match score computed based on each match key defined in the match rule attached to the data model entity. You can move your mouse over the score to view the score expressed as a decimal.

  7. You can also simulate the match operation on customized data records. To do that, click the Edit Records button to open the [Edit Records] dialog box.

    Review the data records and edit them according to your needs. Then, click the Rerun Simulation button and check the newly simulated match result in the [Match Result] dialog box.

  8. If needed, you can click the DETAILS field in the first row and then click the [...] button to open the [Match Detail] dialog box, which gives details about how those data records are matched.

    After checking the match details, click OK to close the dialog box.

  9. Once you are done with the match simulation operation, click OK to close the [Match Result] dialog box.