Setting up the input records - Cloud - 8.0

Data matching with Talend tools

Version
Cloud
8.0
Language
English
Product
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Data Quality components > Matching components > Continuous matching components
Data Governance > Third-party systems > Data Quality components > Matching components > Data matching components
Data Governance > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Data Governance > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Continuous matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Data matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Design and Development > Third-party systems > Data Quality components > Matching components > Continuous matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Data matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Last publication date
2024-02-06

Procedure

  1. Double-click tFixedFlowInput to open its Component view.
  2. Click the [...] button next to Edit schema to open the schema editor.
  3. Click the plus button and add five rows.
    Rename these rows respectively as the following: Record_ID, File, Acctname, GRP_ID and GRP_SIZE.
    The input data has information about group ID and group size. In real life scenario, such information can be gathered by the tMatchGroup component as shown in scenario 1. tMatchGroup groups duplicates in the input data and gives each group a group ID and a group size. These two columns are required by tRuleSurvivorship.
  4. In the Type column, select the data types for your columns. In this example, set the type to Integer for Record_ID and GRP_SIZE, and set it to String for the other columns.
    Note:

    Make sure to set the proper data type so that you can define the validation rules without error messages.

  5. Click OK to validate these changes and accept the propagation when prompted by the pop-up dialog box.
  6. In the Mode area of the Basic settings view, select Use Inline Content (delimited file).
  7. In the Content field, enter the input data to be processed.
    This data should correspond to the schema you have defined. In this example, the input data is as the following:
    1;2;AcmeFromFile2;1;2
    2;1;AcmeFromFile1;1;0
    3;1;AAA;2;1
    4;2;BBB;3;1
    5;1;  ;4;2
    6;2;NotNull;4;0
  8. Set the row and field separators in the corresponding fields.