Setting rules and values for master records - Cloud

Talend Cloud Data Stewardship Getting Started Guide

Version
Cloud
Language
English
Product
Talend Cloud
Module
Talend Data Stewardship
Content
Data Governance > Managing campaigns
Data Governance > Managing data models
Data Quality and Preparation > Deduplicating data
Data Quality and Preparation > Handling tasks
Last publication date
2024-03-05
When client duplicate records come from different sources, Talend Cloud Data Stewardship determines initially which attributes, of the matched records, to use to create master records according to the survivorship rules defined in the campaign.

About this task

Data stewards review their tasks and manually modify survivorship rules per record attribute, or enter completely new values to reach the most accurate and reliable master records.

Procedure

  1. Log in as a data steward.
  2. On the Tasks page, click the campaign name, Reconciling client data in this example, to open a list of the tasks assigned to you.
    The quality bar at the top of the list uses colors to give you a clear view about the quality of the data in each of the columns. Pointing to a color gives you details about the data values in the selected column.
    List of tasks assigned to the user in the Reconciling client data campaign.
  3. Click a color in the quality bar to filter the data on which you want to work and list the tasks which match the color indication:
    • Green: Represents valid data which matches the columns type.
    • Grey: Represents empty fields. However, an empty value for a mandatory field is marked as red, not white.
    • Red: Represents invalid data which does not match the column type or the parameter set in the data model.
  4. Click the down arrow on the top-left corner of the task list to expand all the tasks, or click the down arrow of a specific task to expand it.
  5. Set survivorship rules to select attributes from customer records and use them to build the master records. Several approaches are possible:
    • Set a survivorship rule manually for one attribute of multiple records.

      1. Click a column heading, Last_Name for example, and in the right-hand panel browse to the Survivorship section.
      2. Expand the Survivorship rule list and select Most common as the survivorship rule you want to apply to the name attribute in all the tasks in the list.
      3. If you want to apply the rule to all name values including null ones, clear the Avoid null values check box, otherwise leave it selected.
      4. Click Submit to select the most common name values and add them to the master records of all the tasks.
    • Set a survivorship rule manually for all attributes of one or multiple golden records.

      1. Select the tasks for which to set the rule, and under Task in the right-hand panel click Apply survivorship rule.
      2. From the Selection list, click Selected tasks.

        You can apply the rule to all tasks or only to the filtered tasks if you have defined a filter on the list.

      3. From the Rule list, select to apply Most trusted for example to the group of selected tasks.

        If you have defined in the Merging campaign the sources of the duplicate data, the sources names are included in the list and can be selected as the survivorship rule to apply to the column values.

      4. If you want to apply the rule to all values including null ones, clear the Avoid null values check box, otherwise leave it selected.
      5. Click Submit to add the name values with the highest score to the selected golden records.
    • Set a survivorship rule manually for one or several attributes of a record: point to an attribute in the master record of a task and from the icons which display, select the survivorship rule you want to apply.

      • Select first valid icon: Selects the first valid attribute value among the duplicates. "First" is defined by the order of the records when the task is created.
      • Select most common icon: Selects the most common attribute value among the duplicates.
      • Select most recent icon: Selects the most recent attribute value among the duplicates.
      • Select most trusted icon: Selects the most trusted attribute value among the duplicates.

        Survivorship icons are grayed out when the survivorship rule is not applicable on the selected record.

    • Select the value of a given source attribute to be the value for the master record: point to a source attribute and click the up arrow to set the selected value in the master record.
  6. Optionally, double-click the value in the master record and set a value of your choice which is not present in any of the sources.
  7. Click the Lock icon next to the data record you modified to mark the task as ready to be validated.
    The first field is marked with green background and a percentage of the completion of your tasks is calculated and displayed in the top right corner.

    You can remodify the records ready to be validated, but this puts the task back to its initial state with a dark-grey background color. You need to reclick the lock icon to mark the task as ready for validation.

  8. If the lock icon has a red background color, correct the invalid value in the task before you can mark it as ready to be validated.
  9. Repeat the above step to create master records for all the tasks assigned to you.
  10. Click Validate in the top right corner to approve the changes and move the task from your list.

Results

Master records are created and the records which are validated are moved to the list of the campaign participant who is granted the Account validator role in this example.