Procedure
-
Double-click tMatchGroup to open the configuration wizard
where you can define the match rule.
-
In the Key Definition table, define what match algorithms to use and
on what columns. Similarly, in the Blocking Selection table, select what column to use as a
blocking value in order to reduce the number of pairs that need to be
examined.
For further information, see tMatchGroup.
- Click the Chart button to have the matching results in the wizard and then click OK.
-
In the component properties, click Advanced settings and make sure the Sort output data by GID check box is selected.
Note: If this option is not enabled, potential duplicates could be grouped in different tasks when loaded to Talend Data Stewardship.
-
Double-click tMap to open its editor.
-
Map the input data flow to the output flow and
the GID and MASTER
columns to TDS_GID and TDS_MASTER
respectively.
For further information about tMap, see tMap Standard properties.
-
When data comes from a single source, enter the source name for
the TDS_SOURCE column in the right-hand
table, CRM in this example. Make sure
that the source name does not contain dots and that it does not start with a
dollar sign.
If you do not specify a source name, Source 1, Source 2 and so on are added by default.
-
If you need to store the matching results in an external
system, map GID to TDS_EXTERNAL_ID.
This helps you reference a given task from the external system.
-
When data comes from different sources and if the input schema
has a column which holds the source names, map the source column to TDS_SOURCE.
If you do not specify the source names, Source 1, Source 2 and so on are added by default.
If you specify the same name in multiple sources of the same tasks, the suffixes -1, -2 and so on are added by default. For example, if you create a task with three sources SAP, the source names in Talend Data Stewardship are displayed as SAP, SAP - 1, SAP - 2.
You can also compute dynamically the trust scores of specific records if you provide them at the task source level and map them to the TDS_RATING output column in tDataStewardshipTaskOutput. These trust scores override the scores defined at campaign creation, if any.
Make sure that the source names in the input file do not contain dots and that they do not start with a dollar sign.
- Click OK.