Merging tasks aim to merge several potential duplicates into one
single record: master record. Potential duplicates can come from the same source (data
deduplication) or from different sources (data reconciliation).
In a Merging campaign, you can only modify values in the master
fields, values in the source fields can not be modified.
Merging data values and validating your modifications transition the task
to the second state defined in the workflow. The workflow defined at the campaign
creation determines which states are available to what data stewards. However, a task
cannot be validated or even marked as ready as long as it contains at least one invalid
value.
About this task
Customer duplicate records come
from the same source (enterprise CRM).
Talend Cloud Data Stewardship
determines initially which attributes of matched records to use to create the master
record according to the survivorship rules defined when creating the campaign. However,
you may need to manually modify survivorship rules per record attribute or enter
completely new values to reach the most accurate and reliable master records.
Procedure
-
On the Tasks page, click the campaign name,
CRM Data Deduplication in this example, to open a
list of the tasks assigned to you.
-
Use the quality bar on top of each of the columns to filter the data on which
you want to work in the Chart or
Pattern views in the right-hand panel.
-
Click the down arrow on the top-left corner to expand all tasks in the list, or
click the down arrow of a specific task to expand it.
-
Set survivorship rules to select attributes from customer records and use them
to build the master records. Several approaches are possible.
-
Set a survivorship rule manually for one attribute of multiple
records.
- Click a column heading, First_Name for
example, and in the right-hand panel browse to the
Survivorship section.
- Click Apply survivorship rule and from the
Rule list, select Most
common as the survivorship rule you want to apply to the
name attribute in all the customer records.
If you have defined
in the Merging campaign the sources of the
duplicate data, the sources names are included in the list and can
be selected as the survivorship rule to apply to the column
values.
- If you want to apply the rule to all name values including null
ones, clear the Avoid null values check box,
otherwise leave it selected.
- Click Submit to select the most common name
values and add them to the master records of the tasks.
-
Set a survivorship rule manually for all attributes of one or multiple
golden records.
- Select the tasks for which to set the rule, and under
Task in the right-hand panel click
Apply survivorship rule.
- From the Selection list, click
Selected tasks.
You can apply the rule to
all tasks or only to the filtered tasks if you have defined a filter
on the list.
- From the Rule list, select to apply
Most trusted for example to the group of
selected tasks.
- If you want to apply the rule to all values including null ones,
clear the Avoid null values check box,
otherwise leave it selected.
- Click Submit to add the name values with the
highest score to the selected golden records.
-
Set a survivorship rule manually for one or several attributes of a
record: expand the task and hover over an attribute in the master record
of a task and from the icons which display, select the survivorship rule
you want to apply.
-
: selects the first valid attribute value among the
duplicates. "First" is defined by the order of the
records when the task is created.
-
: selects the most common attribute value among the
duplicates.
-
: selects the most recent attribute value among the
duplicates.
-
: selects the most trusted attribute value among the
duplicates coming from different sources.
Icons are grayed out when rules are not applicable on the
selected attribute. In this example, the icon for the most
trusted attribute is not functional since customer data comes
from one single source: CRM.
-
Set a survivorship rule manually for one attribute of multiple
records.
- Click a column heading, First_Name for
example, and in the right-hand panel browse to the
Survivorship section.
- Click the Apply survivorship rule... and from the
Rule list, select Most
common as the survivorship rule you want to apply to
the name attribute in all the customer records.
- Click Submit to select the most common name
values and add them to the master records of the tasks.
- Select the value of a given source attribute to be the value for the
master record: point to a source attribute and click the up arrow to set the
selected value in the master record.
-
Optionally, click the email link in the Email column to
open a new window and send an email to the customer about any necessary
validation of the information in the customer data record.
Note: Email addresses will display as hyperlinks only if you set the semantic
type for the Email column to MailTo
URL while defining the data model for the campaign.
-
Repeat the above step to merge records and create master records for all the
tasks assigned to you.
If a given column has some values which need to be fixed, you can bulk
transform them by using the functions listed in the right panel.
-
Click the icon next to the data record you modified to mark the task as ready to
be validated.
When the lock icon has a red background color, you must first correct the
invalid value in the task before being able to mark it as ready to be
validated.
The record is marked with green background and the lock icon is
automatically moved to the next record. You can remodify the records ready
to be validated, but this puts the task back to its initial state with a
dark gray background color. You need to reclick the lock icon to mark the
task as ready for validation.
-
Click Validate in the top-right corner of the
page to validate the modifications you have done on the records.
Master records are created and the records which are
validated are moved from the list and transitioned to the next step in the
workflow where they need to be approved by another data steward. In this
example, they are moved to the list of the data steward who is granted the
Account manager role.
-
The data stewards with the Account
manager role, access the tasks to be validated and decide to
accept or reject the choices done on the tasks.
Results
Approved tasks are transitioned to the Resolved state in the workflow. Rejected tasks
are transitioned back to the initial step in the workflow and marked as new.