Setting a data model in the Merging campaign - 6.5

Talend Data Stewardship Examples

author
Talend Documentation Team
EnrichVersion
6.5
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
task
Administration and Monitoring > Managing users
Data Governance > Assigning tasks
Data Governance > Managing campaigns
Data Governance > Managing data models
Data Quality and Preparation > Handling tasks
EnrichPlatform
Talend Data Stewardship

The data model used in a campaign decides the structure of the data to be managed.

In a campaign, you need to select what data model to use for the syntactic and semantic validation of data and decide the read/write access permission per role to each of the attributes in the selected data model.

Procedure

  1. In the homepage, click Data Model and select from the model list the data structure you want to use in the CRM Data Deduplication campaign.

    The Data Model list gives access to all the data models that have been defined on the Talend Data Stewardship server.

  2. Select the buttons next to each of the attributes in the data structure to set permission per attribute and per data steward and define who can view/edit which attributes.
    Option Description
    gives a read/write access to the attribute in the data model.
    gives only a read access to the attribute in the data model.

    This type of access is useful if the data steward needs to access the information to make a relevant decision but must not change the value, for instance unique identifiers of other elements linked to the entity the steward is viewing, or data that you know is reliable and must not be changed.

    gives no access to the attribute.

    Hiding an attribute is useful if the information is sensitive and should not be visible by the data steward, financial information for instance. Another example of attributes to be hidden is if the information is just noise for the steward, technical identifier for instance, but need to be propagated as part of the task.

    For example, in the CRM Data Deduplication campaign, you grant a read-only access to the identifier attribute for the data stewards who are assigned the account analyst role.
  3. Select a rule from the Survivorship Rule lists next to each of the attributes.
    These rules are automatically used to decide what attribute values define the master records when loading data into the campaign. Data stewards can then manually modify these choices.
    Option Description
    First valid Selects the first source which contains a valid value with regards to the data type of the attribute defined in the data model. "First" is defined by the order of the records when the task is created.
    First not null Selects the first source which contains a value, where "first" is defined by the order of the records when the task is created.
    Most common Selects the most common attribute value of the duplicates coming from one or more data sources.
    Most recent Selects the most recent attribute value of the duplicates coming from one or more data sources. This is based on the metadata of the last update date.
    Most trusted Selects the most trusted attribute value of the duplicates as per the trust score you set when creating the campaign or when loading the tasks in the campaign. If no trust score is defined, this option does not work.
    You can select one rule for all the attributes by selecting it from the list in the top-right corner of the form. If a given algorithm cannot be applied, the rule falls back to First not null. For example, if you do not set a trust score and you select Most trusted during the campaign definition, First not null is used in place. Similarly, First not null is used if you select Most common or First valid and there are no common or no valid values among the data duplicates.
    Below are examples about how survivorship rules dictate what value to choose to build master records.
    First valid: Email address:
    • If the first value is not valid while the second is, the second email wins.
    • If all email addresses are invalid, the first non-empty value wins.
    First not null: First name:
    • If the first value is empty while the second is not, the second first name wins.
    • If all first names are empty, first name is empty in master record.
    Most common: Last name:
    • If last names are identical in two source records, this value wins.
    • If last names are different in all source records, the first non-empty value wins.
    Most recent: Phone number and timestamp:
    • If one phone number has the most recent timestamp, this value wins.
    • If all phone numbers have the same timestamp, the first non-empty value wins.
    Most trusted: Address:
    • If all addresses in the source records have trust scores, the value with the highest score wins.
    • If all addresses in the source records have trust scores and two are identical, the first identical address wins.
    • If all addresses do not have trust scores, the first non-empty value wins.