Configuring the tDataShuffling component - 7.0

Data privacy

author
Talend Documentation Team
EnrichVersion
7.0
EnrichProdName
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Data Quality components > Data privacy components
Data Quality and Preparation > Third-party systems > Data Quality components > Data privacy components
Design and Development > Third-party systems > Data Quality components > Data privacy components
EnrichPlatform
Talend Studio

Procedure

  1. Double-click tDataShuffling to display the Basic settings view and define the component properties.
  2. Click Sync columns to retrieve the schema defined in the input component.
  3. In the Shuffling columns table, click the [+] button to add four rows, and then:
    • in the Column, select the columns where data will be shuffled,

    • in the Group ID, select the group identifier for each column. The columns having the same group identifier are shuffled together.

    In the above example, there are two groups of columns to be shuffled:
    • Group ID 1: credit_card

    • Group ID 2: lname, fname and mi

    The Job will replace credit card numbers within the credit_card column with values from different rows. It will also keep last names, first names and middle initial values, from the lname, fname and mi columns together and replace them with values from different rows.
  4. Click the Advanced settings tab.
    In the Partitioning columns table, click the [+] button to add one row.
    The Job will shuffle the original data rows sharing the same value for the partitioning columns.
    In the above example, the component is configured to apply the shuffling process to the rows sharing the same value for the country column.