Configuring the components - Cloud - 8.0

Data matching with Talend tools

Version
Cloud
8.0
Language
English
Product
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Data Quality components > Matching components > Continuous matching components
Data Governance > Third-party systems > Data Quality components > Matching components > Data matching components
Data Governance > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Data Governance > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Continuous matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Data matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Design and Development > Third-party systems > Data Quality components > Matching components > Continuous matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Data matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Last publication date
2024-02-06

Procedure

  1. Double-click the tFileInputDelimited component to display its Basic settings view.
    Important: The dynamic schema feature is only supported in Built-In mode and requires the input file to have a header row.
  2. Click the [...] button next to the File Name/Stream field to browse to your input file.
  3. Define the header and footer rows.
    In this example, the first row of the input file is the header row.
  4. Click Edit schema to define the schema for this component.
    In this example, the input file has five columns: FirstName, LastName, HouseNo, Street, and City. However, as we can leverage the advantage of the dynamic schema feature, we simply define one dynamic column in the schema, Dyna in this example.
    1. Add a new line by clicking the [+] button.
    2. Type Dyna in the Column field.
    3. Select Dynamic from the Type list.
    4. Click OK.
  5. Double-click the tExtractDynamicFields component to display its Basic settings view.
    We will use this component to split the dynamic column of the input schema into two columns, one for the first name and the other for the family related information. To do so:
    1. Click Edit schema to open the Schema dialog box.
    2. In the output panel, click the [+] button to add two columns for the output schema, and name them FirstName and FamilyInfo respectively.
    3. Select String from the Type list for the FirstName column.
      This will extract this column from the input schema to carry the first name of each person on the name list.
    4. Select Dynamic from the Type list for the FamilyInfo column.
      This column will carry the rest information of each person on the name list: the last name, house number, street and city, which all together will identify a family.
    5. Click OK to propagate the schema and close the Schema dialog box.
  6. Double-click the tUniqRow component to display its Basic settings view.
  7. In the Unique key area, select the Key attribute check box for the FamilyInfo column.
    This will carry out deduplication on the family information.
  8. Double-click the tFileOutputDelimited component to display its Basic settings view.
  9. Define the output file path and select the Include header check box.
  10. Leave the other settings as they are.
  11. In the Basic settings view of the tLogRow component, select the Table option to view the Job execution result in table mode.