Skip to main content Skip to complementary content

Grouping the duplicate records

Procedure

  1. Right-click tMatchGroup to open its contextual menu and select Configuration Wizard.
    From the wizard, you can see how your groups look like and you can adjust the component settings in order to correctly get the similar matches.
  2. Click the plus button under the Key Definition table to add one row.
  3. In the Input Key Attribute column of this row, select acctName. This way, this column becomes the reference used to match the duplicates of the input data.
  4. In the Matching Function column, select the Jaro-Winkler matching algorithm.
  5. In the Match threshold field, enter the numerical value to indicate at which value two record fields match each other. In this example, type in 0.6.
  6. Click Chart to execute this matching rule and show the result in this wizard.
    If the input records are not put into one single group, replace 0.6 with a smaller value and click Chart again to check the result until all of the four records are in the same group.
    The Job in this scenario puts four similar records into one single duplicates group so that tRuleSurvivorship is able to create one survivor from them. This simple sample allows you to have a clear picture about how tRuleSurvivorship works along with other components to create the best data. However, in the real-world case, you may need to process much more data with complex duplicate situation and thus put the data into much more groups.
  7. Click OK to close this Configuration wizard and the Basic settings view of the tMatchGroup component is automatically filled with the parameters you have set.
    For further information about the Configuration wizard, see Configuration wizard

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!