Scenario: Altering data values to restrict the use of actual sensitive data - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

With the tDataMasking component, you can replace sensitive information such as credit card or social security numbers with realistic values, allowing production data to be safely used for purposes such as testing and training.

This scenario describes a Job which uses:

  • the tFixedFlowInput component to generate personal data including credit card numbers,

  • the tDataMasking component to hide specific original data with random characters or figures,

  • the tFileOutputExcel component to output the substitute data set.

Setting up the Job

  1. Drop the following components from the Palette onto the design workspace: tFixedFlowInput, tDataMasking and tFileOutputExcel.

  2. Connect the three components together using the Main links.

Configuring the input component

  1. Double-click tFixedFlowInput to open its Basic settings view in the Component tab.

  2. Create the schema through the Edit Schema button.

    In the open dialog box, click the [+] button and add the columns that will hold the initial input data.

  3. Click OK.

  4. In the Number of rows field, enter 1.

  5. In the Mode area, select the Use Inline Content option.

  6. In the Content table, enter the customer data you want to replace with realistic values, for example:

    0|4244487462024688|Nowmer|Sheri|A.|2433 Bailey Road|Tlaxiaco|Oaxaca|15057|Mexico|271-555-9715|SheriNowmer@@Tlaxiaco.org
    1|3458687462024688||Sheri|A.|2433 Bailey Road|Tlaxiaco|Oaxaca|15057|Mexico|271-555-9715|SheriNowmer@Tlaxiaco.org.org
    2|4639587470586299|Whelply|Derrick|I.|2219 Dewing Avenue|Sooke|BC|17172|Canada|211-555-7669|DerrickWhelply@Sooke.org
    3|2541387475757600|Derry|Jeanne||7640 First Ave.|Issaquah|WA|73980|USA|656-555-2272|JeanneDerry@Issaquah.org
    4|7845987500482201|Spence|Michael|J.|337 Tosca Way|Burnaby|BC|74674|Canada|929-555-7279|MichaelSpence@Burnaby.org
    5|1547887514054179|Gutierrez|Maya||8668 Via Neruda|Novato|CA|57355|$$#|387-555-7172|MayaGutierrez@Novato.org
    6|5469887517782449|Damstra|Robert|F.|1619 Stillman Court|Lynnwood|WA|90792|$$#|922-555-5465|RobertDamstra@Lynnwood.org
    7|54896387521172800|Kanagaki|Rebecca||2860 D Mt. Hood Circle|||13343|Mexico|515-555-6247|RebeccaKanagaki@Tlaxiaco.org
    8|47859687539744377||Kim|H.|6064 Brodia Court|San Andres|DF|12942|Mexico|411-555-6825|Kim@Brunner@San Andresorg
    9|35698487544797658||Brenda|C.|7560 Trees Drive||BC|$$|Canada|815-555-3975|BrendaBlumberg@Richmond.org
    10|36521487568712234|Stanz|Darren|M.|1019 Kenwal Rd.|$$#|OR|82017|USA|847-555-5443|DarrenStanz@Lake Oswego.org
    ...

Replacing actual data with realistic values

  1. Double-click tDataMasking to display the Basic settings view and define the component properties.

  2. If required, click Sync columns to retrieve the schema defined in the input component.

  3. Click the Edit schema button to open the schema dialog box.

    tDataMasking proposes one predefined read-only column as shown in the below capture.

    This column identifies by true or false if the output record is an original or a substitute record respectively.

  4. Move any of the input columns to the output schema if you want to show them in the results, click OK and accept to propagate the changes.

  5. In the Modifications table, click the [+] button to add four rows, and then:

    • in the Input Column, select the columns which content you want to substitute,

    • in the Function column, select from the predefined list the function you want to use to generate the substitute data,

    • in the Parameter column, enter a value, a pattern or a path to be used by the function to substitute data.

    The Job will generate inauthentic credit card numbers, replace the first three letters of first names, replace last names with names from a local file and finally replace the part before the @ sign in email addresses by a series of X.

  6. Click the Advanced settings tab and select the Output the original row check box.

    The Job will add the original data rows to the substitute data.

Configuring the output component and executing the Job

  1. Double-click the tFileOutputExcel component to display the Basic settings view and define the component properties.

  2. Set the destination file name as well as the sheet name and then select the Define all columns auto size check box.

  3. Save your Job and press F6 to execute it.

    The tDataMasking component substitute data in the selected columns and writes the result in an output file.

  4. Right-click the output component and select Data Viewer to display the original and substituted data.

    tDataMasking outputs original and substitute rows marked respectively with true and false in the ORIGINAL_MARK column. It generates inauthentic credit card numbers, replaces the first three letters of first names, replaces last names with names from a local file and finally replaces the part before the @ sign in email addresses by the names defined in the component basic settings.

    Sensitive personal information in the input data has been "hidden" but data keeps looking real and consistent. The substitute data is still usable for purposes other than production.