Talend Data Stewardship Getting Started Guide - 6.4

Talend Data Stewardship Getting Started Guide

author
Talend Documentation Team
EnrichVersion
6.4
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
task
Data Governance > Managing campaigns
Data Quality and Preparation > Deduplicating data
Data Quality and Preparation > Reconciliating data
EnrichPlatform
Talend Data Stewardship

Talend Data Stewardship is a tool you can use to manage data assets. It organizes the interactions on data whenever human intervention is required to collaborate to data curation, arbitration or validation.

The core concepts of Talend Data Stewardship are campaigns and tasks. It comes with two predefined roles namely: campaign owners and data stewards.

A campaign is the main unit of work for campaign owners. It contains all the required configuration assets that are determined by the campaign owner:
  • what are the tasks about (data structure, validation constraints, etc.)?

  • what do data stewards have to do to resolve the campaign tasks (task type)?

  • which data stewards work on the campaign tasks (campaign participants)?

  • how data stewards collaborate to resolve the campaign tasks (campaign workflow)?

A task is the main unit of work for data stewards. A task belongs to a campaign and is assigned to a data steward. It has a lifecycle where it passes through different states according to the workflow defined in the campaign.

One of the solutions Talend Data Stewardship provides is to match, cleanse and master data using a Merging campaign.

The use case described here uses a Merging campaign to illustrate the aspects of reconciling data coming from different sources. However, Talend Data Stewardship supports additional campaign types including Arbitration, Resolution and Grouping.

In a Merging campaign, you must consider two aspects:
  • How do you identify the match groups which group potentially duplicate records together? This question is resolved through using a Talend Job in the Studio.
  • How do you pick the best attribute values from the data sources and presents the most accurate and reliable master records for consumptions by users and systems? This issue is resolved through the Merging campaign in the web application.

This use case helps you getting started with Talend Data Stewardship. To replicate the example and use the exact client data, we assume that:

  • An administrator has installed and launched Talend Data Stewardship. For more information, see the Talend Administration Center Installation Guide.
  • An administrator has created Talend Data Stewardship users and assigned them roles in Talend Administration Center. For further information, see Creating Data Stewardship users.

  • A campaign owner has downloaded the input data and the Talend Job used in this example. They can be used to load tasks in the Merging campaign once it is created.

    Retrieve the tds_gettingstarted_source_files.zip file from the Downloads tab in the left panel of this page.