Configuring the second pass - 7.0

Data matching

author
Talend Documentation Team
EnrichVersion
7.0
EnrichProdName
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Data Quality components > Matching components > Data matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Data matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Data matching components
EnrichPlatform
Talend Studio

Procedure

  1. In the basic settings of the tMatchGroup labelled pass2, select Simple VSR from the Matching Algorithm list.
    In this scenario, the match rule is based on the VSR algorithm.
  2. Click the Preview button to display the Configuration Wizard.
    If this component does not have the same schema of the preceding component, a warning icon appears. If so, click the Sync columns button to retrieve the schema from the preceding one and once done, the warning icon disappears.
  3. In the Key Definition table, click the [+] button to add the column(s) on which you want to do the matching operation, lname in this scenario.
    Note: When you select a date column on which to apply an algorithm or a matching algorithm, you can decide what to compare in the date format.

    For example, if you want to only compare the year in the date, in the component schema set the type of the date column to Date and then enter "yyyy" in the Date Pattern field. The component then converts the date format to a string according to the pattern defined in the schema before starting a string comparison.

  4. Select the Jaro-Winkler algorithm in the Matching Function column.
  5. Set Weight to 1 and in the Handle Null column, select the null operator you want to use to handle null attributes in the column, Null Match Null in this scenario.
  6. Click the [+] button below the Blocking Selection table to add one row in the table then click in the line and select from the list the column you want to use as a blocking value, T_GEN_KEY1 in this example.
  7. Click the Advanced settings tab and select the Multi-pass check box. This option enables tMatchGroup to receive data sets from the tMatchGroup that precedes it in the Job.
  8. In the Advanced settings view, select the Sort the output data by GID check box to arrange the output data by their group IDs.
  9. Select the Deactivate matching computation when opening the wizard check box if you do not want to run the match rules the next time you open the wizard.