Writing tasks in a Merging campaign - 7.1

Data Stewardship

author
Talend Documentation Team
EnrichVersion
Cloud
7.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Data Stewardship components
Data Quality and Preparation > Third-party systems > Data Stewardship components
Design and Development > Third-party systems > Data Stewardship components
EnrichPlatform
Talend Data Stewardship
Talend Studio

This Job loads tasks into a Merging campaign defined in Talend Data Stewardship according to the criteria you define in the basic settings of the tDataStewardshipTaskOutput component.

The data records in these tasks have duplicates, but Talend Data Stewardship can merge the redundant data and create master records based on trust scores you can define while creating the campaign in the application. Once data is loaded to the campaign, authorized campaign participants can intervene and manually set survivorship rules per attributes in the data records or enter completely new values when resolving the tasks.

It is also possible to do a dynamic computation of the trust score of a given record based on some business rules embedded within the Job. In such a scenario, you need to provide the trust score for one or more records and then map them to the TDS_RATING output column in tDataStewardshipTaskOutput. These trust scores override the scores defined at campaign creation, if any.

For more technologies supported by Talend, see Talend components.

This scenario applies only to subscription-based Talend products.

In this Job:

  • The tFileInputDelimited component reads the customer data.

  • The tMatchGroup component compares data using matching and blocking methods and creates groups of similar encountered duplicates.

  • The tMap component maps the group identifier, GID, generated by tMatchGroup to TDS_GID.

    When the input data has a column which holds the names of the data sources, tMap can also map the input column to TDS_SOURCE.

  • The tDataStewardshipTaskOutput component writes the data in the CRM Data Deduplication campaign in Talend Data Stewardship.