Configuring the components - Cloud - 8.0

Data matching with Talend tools

Version
Cloud
8.0
Language
English
Product
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Data Quality components > Matching components > Continuous matching components
Data Governance > Third-party systems > Data Quality components > Matching components > Data matching components
Data Governance > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Data Governance > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Continuous matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Data matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Design and Development > Third-party systems > Data Quality components > Matching components > Continuous matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Data matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Last publication date
2024-02-06

Procedure

  1. Define the first tFixedFlowInput in its Basic settings view.
    In this example, you use the following input:
    FirstName;Name
    Brad;Los angeles
    Jason;New York
    Margaret;
    Kourtney;Seattle
    Nicole;Saint-Louis
    John;Denver
  2. Define the schema of the component. In this example, the input schema has two columns: FirstName and City.
  3. Define the second tFixedFlowInput.
    In this example, you use the following input:
    FirstName;City
    Brad;Los Angeles
    Jason;New York
    Margaret;Dallas
    Courtney;Seattle
    Nicole;Saint-Louis
    Jon;Denver
  4. Set the reference column as key column in the schema of the lookup flow.
  5. Double-click the tFuzzyMatch component to open its Basic settings view, and check its schema.
    The Schema should match the Main input flow schema in order for the main flow to be checked against the reference.
    Note that two columns, Value and Matching, are added to the output schema. These are standard matching information and are read-only.
  6. Select the method to be used to check the incoming data. In this example, Levenshtein is the Matching type to be used.
  7. Set the distance.
    In this method, the distance is the number of char changes (insertion, deletion, or substitution) that needs to be carried out in order for the entry to fully match the reference.
    In this example, you set both the minimum distance and the maximum distance to 0. This means only the exact matches will be output.
  8. Clear the Case sensitive check box.
  9. Select the matching column and look up column. The first name in this example.
  10. Leave the other parameters as default.