Double-click tMatchGroup to
open the configuration wizard where you can define the match rule.
In the Key Definition table,
define what match algorithms to use and on what columns. Similarly, in the
Blocking Selection table, select what
column to use as a blocking value in order to reduce the number of pairs that
need to be examined.
For further information, see tMatchGroup.
- Click the Chart button to have the matching results in the wizard and then click OK.
In the component properties, click Advanced settings and
make sure the Sort output data by GID check box is
Note: If this option is not enabled, potential duplicates could be grouped in different tasks when loaded to Talend Data Stewardship.
Double-click tMap to open
Map the input data flow to the output flow and the GID and MASTERcolumns to
For further information about tMap, see tMap Standard properties.
When data comes from a single source, enter the source name for the
TDS_SOURCE column in the right-hand table,
CRM in this example. Make sure that the source name
does not contain dots and that it does not start with a dollar sign.
If you do not specify a source name, Source 1, Source 2 and so on are added by default.
If you need to store the matching results in an external system, map
GID to TDS_EXTERNAL_ID.
This helps you reference a given task from the external system.
When data comes from different sources and if the input schema has a column
which holds the source names, map the source column to
If you do not specify the source names, Source 1, Source 2 and so on are added by default.
If you specify the same name in multiple sources of the same tasks, the suffixes -1, -2 and so on are added by default. For example, if you create a task with three sources SAP, the source names in Talend Data Stewardship are displayed as SAP, SAP - 1, SAP - 2.
Make sure that the source names in the input file do not contain dots and that they do not start with a dollar sign.
- Click OK.