Configuring the first pass - 7.3

Data matching with Talend tools

Version
7.3
Language
English
Product
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Data Quality components > Matching components > Continuous matching components
Data Governance > Third-party systems > Data Quality components > Matching components > Data matching components
Data Governance > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Data Governance > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Continuous matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Data matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Design and Development > Third-party systems > Data Quality components > Matching components > Continuous matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Data matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Last publication date
2024-02-06

Procedure

  1. In the basic settings of the tMatchGroup labelled pass1, select Simple VSR from the Matching Algorithm list.
    In this scenario, the match rule is based on the VSR algorithm.
  2. Click the Preview button to display the Configuration Wizard.
  3. Click and import matching keys from the match rules created and tested in the Profiling perspective of Talend Studio and use them in your Job. Otherwise, define the matching key parameters as described in the below steps.
    It is important to import or define the same type of the rule selected in the basic settings of the component, otherwise the Job runs with default values for the parameters which are not compatible between the two algorithms.
  4. In the Key definition table, click the [+] button to add the column(s) on which you want to do the matching operation, lname in this scenario.
    Note: When you select a date column on which to apply an algorithm or a matching algorithm, you can decide what to compare in the date format.

    For example, if you want to only compare the year in the date, in the component schema set the type of the date column to Date and then enter "yyyy" in the Date Pattern field. The component then converts the date format to a string according to the pattern defined in the schema before starting a string comparison.

  5. Select the Jaro-Winkler algorithm in the Matching Function column.
  6. From the Tokenized measure list, select Any order.
  7. Set Weight to 1 and in the Handle Null column, select the null operator you want to use to handle null attributes in the column, Null Match Null in this scenario.
  8. Click the [+] button below the Blocking Selection table to add one row in the table then click in the line and select from the list the column you want to use as a blocking value, T_GEN_KEY in this example.
    Using a blocking value reduces the number of pairs of records that needs to be examined. The input data is partitioned into exhaustive blocks based on the functional key. This will decrease the number of pairs to compare, as comparison is restricted to record pairs within each block.
  9. If required, click Edit schema to open the schema editor and see the schema retrieved from the previous component in the Job.
  10. Click the Advanced settings tab and select the Sort the output data by GID check box to arrange the output data by their group IDs.
  11. Select the Deactivate matching computation when opening the wizard check box if you do not want to run the match rules the next time you open the wizard.