Setting up the Job - 7.0

Data matching

author
Talend Documentation Team
EnrichVersion
7.0
EnrichProdName
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Data Quality components > Matching components > Data matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Data matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Data matching components
EnrichPlatform
Talend Studio

About this task

In this scenario, the main input schema is already stored in the Repository. For more information about storing schema metadata in the repository, see the Talend Studio User Guide.

Procedure

  1. In the Repository tree view, expand Metadata - DB Connections where you have stored the main input schema and drop the database table onto the design workspace. The input table used in this scenario is called customer.
    A dialog box is displayed with a list of components.
  2. Select the relevant database component, tMysqlInput in this example, and then click OK.
  3. Drop two tGenKey components, two tMatchGroup components, a tMap and a tLogRow components from Palette onto the design workspace.
  4. Link the input component to the tGenKey and tMap components using Main links.
  5. In the two tMatchGroup components, select the Output distance details check boxes in the Advanced settings view of both components before linking them together.
    This will provide the MATCHING_DISTANCES column in the output schema of each tMatchGroup.
    If the two tMatchGroup components are already linked to each other, you must select the Output distance details check box in the second component in the Job flow first otherwise you may have an issue.
  6. Link the two tMatchGroup components and the tLogRow component using Main links.
  7. If needed, give the components specific labels to reflect their usage in the Job.
    For further information about how to label a component, see Talend Studio User Guide.