Defining a blocking key from the match analysis - Cloud - 7.3

Talend Studio User Guide

Version
Cloud
7.3
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development
Last publication date
2024-02-13
Available in...

Big Data Platform

Cloud API Services Platform

Cloud Big Data Platform

Cloud Data Fabric

Cloud Data Management Platform

Data Fabric

Data Management Platform

Data Services Platform

MDM Platform

Real-Time Big Data Platform

About this task

Defining a blocking key is not mandatory but advisable. Using a blocking key partitions data in blocks and thus reduces the number of records to be examined, as comparisons are restricted to record pairs within each block. Using blocking key(s) is very useful when you are processing big data set.

Procedure

  1. In the rule editor and in the Generation of Blocking Key section, click the [+] button to add a row to the table.
  2. Set the parameters of the blocking key as the following:
    • Blocking Key Name: Enter a name for the column you want to use to reduce the number of record pairs that need to be compared.

    • Pre-algorithm: Select from the drop-down list an algorithm and set its value where necessary.

      Defining a pre-algorithm is not mandatory. This algorithm is used to clean or standardize data before processing it with the match algorithm and thus improve the results of data matching.

    • Algorithm: Select from the drop-down list the match algorithm you want to use and set its value where necessary.

    • Post-algorithm: Select from the drop-down list an algorithm and set its value where necessary

      Defining a post-algorithm is not mandatory. This algorithm is used to clean or standardize data after processing it with the match algorithm and thus improve the outcome of data matching.

  3. If required, follow the same steps to add as many blocking keys as needed.
    When you import a rule with many blocking keys into the match analysis editor, only one blocking key will be generated and listed in the BLOCK_KEY column in the Data table.
    For further information about the blocking key parameters, see the tGenKey documentation.