Configuring key generation - 7.1

Identification

author
Talend Documentation Team
EnrichVersion
7.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Data Quality components > Identification components
Data Quality and Preparation > Third-party systems > Data Quality components > Identification components
Design and Development > Third-party systems > Data Quality components > Identification components
EnrichPlatform
Talend Studio

Procedure

  1. Double-click tGenKey to display the Basic settings view and define the component properties.
    You can click and import blocking keys from the match rules created with the VSR algorithm and tested in the Profiling perspective of Talend Studio and use them in your Job. Otherwise, define the blocking key parameters as described in the below steps.
  2. Under the Algorithm table, click the [+] button to add a row in this table.
  3. On the column column, click the newly added row and select from the list the column you want to process using an algorithm. In this example, select DoB.
  4. On the algorithm column, click the newly added row and select from the list the algorithm you want to apply to the corresponding column. In this example, select substring(a,b).
  5. Click in the value column and enter the value for the selected algorithm, when needed. In this scenario, type in 6;10.
    The substring(a,b) algorithm allows you to extract the characters from a string, between two specified indices, and to return the new substring. First character is at index 0. In this scenario, for a given DoB "21-01-1995", 6;10 will return only the year of birth, that is to say "1995" which is the substring from the 7th to the 10th character.
    In this example, we want to generate a functional key that holds the last four characters of the date of birth, which correspond to the year of birth, for each of the data rows and we do not want to define any extra options on these columns.
    You can select the Show help check box to display instructions on how to set algorithms/options parameters.
    Once you have defined the tGenKey properties, you can display a statistical view of these parameters. To do so:
  6. Right-click on the tGenKey component and select View Key Profile in the contextual menu.
    The View Key Profile editor displays, allowing you to visualize statistics regarding the number of blocks and to adapt the parameters according to the results you want to get.
    Note:

    When you are processing a large amount of data and when this component is used to partition data in order to use them in a matching component (such as tRecordMatching or tMatchGroup), it is preferable to have a limited number of rows in one block. An amount of about 50 rows per block is considered optimal, but it depends on the number of fields to compare, the total number of rows and the time considered acceptable for data processing.

    From the key editor, you can:
    • edit the Limit of rows used to calculate the statistics.

    • click and import blocking keys from the Studio repository and use them in your Job.

    • edit the input column you want to process using an algorithm.

    • edit the parameters of the algorithm you want to apply to input columns.

    Every time you make a modification, you can see its implications by clicking the Refresh button which is located at the top right part of the editor.
  7. Click OK to close the View Key Profile editor.