Setting up the input records

Deduplication

author
Talend Documentation Team
EnrichVersion
6.5
EnrichProdName
Talend Big Data Platform
Talend Big Data
Talend Open Studio for Big Data
Talend Data Management Platform
Talend Real-Time Big Data Platform
Talend Data Integration
Talend ESB
Talend Data Services Platform
Talend Open Studio for Data Integration
Talend Open Studio for ESB
Talend MDM Platform
Talend Data Fabric
Talend Open Studio for MDM
task
Design and Development > Third-party systems > Data Quality components > Deduplication components
Data Governance > Third-party systems > Data Quality components > Deduplication components
Data Quality and Preparation > Third-party systems > Data Quality components > Deduplication components
EnrichPlatform
Talend Studio

Procedure

  1. Double-click tFixedFlowInput to open its Component view.
  2. Click the three-dot button next to Edit schema to open the schema editor.
  3. Click the plus button and add five rows.
    Rename these rows respectively as the following: Record_ID, File, Acctname, GRP_ID and GRP_SIZE.
    The input data has information about group ID and group size. In real life scenario, such information can be gathered by the tMatchGroup component as shown in scenario 1. tMatchGroup groups duplicates in the input data and gives each group a group ID and a group size. These two columns are required by tRuleSurvivorship.
  4. In the Type column, select the data types for your columns. In this example, set the type to Integer for Record_ID and GRP_SIZE, and set it to String for the other columns.
    Note:

    Make sure to set the proper data type so that you can define the validation rules without error messages.

  5. Click OK to validate these changes and accept the propagation when prompted by the pop-up dialog box.
  6. In the Mode area of the Basic settings view, select Use Inline Content (delimited file).
  7. In the Content field, enter the input data to be processed.
    This data should correspond to the schema you have defined. In this example, the input data is as the following:
    1;2;AcmeFromFile2;1;2
    2;1;AcmeFromFile1;1;0
    3;1;AAA;2;1
    4;2;BBB;3;1
    5;1;  ;4;2
    6;2;NotNull;4;0
  8. Set the row and field separators in the corresponding fields.