Setting a data model in the Merging campaign - Cloud

Talend Cloud Data Stewardship Examples

author
Talend Documentation Team
EnrichVersion
Cloud
EnrichProdName
Talend Cloud
task
Data Governance > Assigning tasks
Data Governance > Managing campaigns
Data Governance > Managing data models
Data Quality and Preparation > Handling tasks
EnrichPlatform
Talend Data Stewardship

Data models decide the structure of the data to be managed. They are used for the syntactic and semantic validation of data.

You can define the read/write access permission per role to each of the attributes listed in a data model.

Procedure

  1. On the ADD CAMPAIGN page, click DATA MODEL and select from the model list the data structure you want to use in the CRM data deduplication campaign.

    The Data Model list gives access to all the data models that have been defined.

  2. Select the buttons next to each of the attributes in the data structure to set permission per attribute and per data steward and define who can view/edit which attributes.
    Option Description
    Provides a read/write access to the attribute in the data model.
    Provides only a read access to the attribute in the data model.

    This type of access is useful if the data steward needs to access the information to make a relevant decision but must not change the value, for instance unique identifiers of other elements linked to the entity the steward is viewing, or data that you know is reliable and must not be changed.

    Provides no access to the attribute.

    Hiding an attribute is useful if the information is sensitive and should not be visible by the data steward, financial information for instance. Another example of attributes to be hidden is if the information is just noise for the steward, technical identifier for instance, but need to be propagated as part of the task.

    Example

    In the CRM Data Deduplication campaign, you grant a read-only access to the identifier attribute for the data stewards who are assigned the Account analyst role.

  3. Select a rule from the Survivorship Rule lists next to each of the attributes.
    These rules are used to decide what attribute values define the master records when loading data into the campaign. Data stewards can then manually modify these choices.
    Option Description
    First valid Selects the first source which contains a valid value with regards to the data type of the attribute defined in the data model. "First" is defined by the order of the records when the task is created.
    First not null Selects the first source which contains a non-empty value, where "first" is defined by the order of the records when the task is created.
    Most common Selects the most common attribute value of the duplicates coming from one or more data sources.
    Most recent Selects the most recent attribute value of the duplicates coming from one or more data sources. This is based on the metadata of the last update date.
    Most trusted Selects the most trusted attribute value of the duplicates as per the trust score you set when creating the campaign or when loading the tasks in the campaign. If no trust score is defined, this option does not work.
    You can select one rule for all the attributes by selecting it from the list in the top right corner of the form. If a given algorithm cannot be applied, the rule falls back to First not null. For example, if you do not set a trust score and you select Most trusted during the campaign definition, First not null is used in place. Similarly, First not null is used if you select Most common or First valid and there are no common or no valid values among the data duplicates.

    Example

    Below are examples about how survivorship rules dictate what value to choose to build master records.
    First valid: Email address:
    • If the first value is not valid while the second is, the second email wins.
    • If all email addresses are invalid, the first non-empty value wins.
    First not null: First name:
    • If the first value is empty while the second is not, the second first name wins.
    • If all first names are empty, first name is empty in master record.
    Most common: Last name:
    • If last names are identical in two source records, this value wins.
    • If last names are different in all source records, the first non-empty value wins.
    Most recent Phone number and timestamp:
    • If one phone number has the most recent timestamp, this value wins.
    • If all phone numbers have the same timestamp, the first non-empty value wins.
    Most trusted: Address:
    • If all addresses in the source records have trust scores, the value with the highest score wins.
    • If all addresses in the source records have trust scores and two are identical, the first identical address wins.
    • If all addresses do not have trust scores, the first non-empty value wins.