Dynamically using the data from another dataset - Cloud

Talend Cloud Data Preparation User Guide

EnrichVersion
Cloud
EnrichProdName
Talend Cloud
EnrichPlatform
Talend Data Preparation
task
Administration and Monitoring > Managing connections
Data Quality and Preparation > Cleansing data
Data Quality and Preparation > Managing datasets

The lookup feature matches data from the current preparation with its counterpart in a reference dataset, allowing you add this reference data to your preparation.

In this example, you are working on a preparation that contains information about the American States your customers live in, but only in the form of a two-letter code. In parallel, you own a second dataset where the two-letter American State codes are matched with the full name of the State. You will use the lookup feature to retrieve this information and add it to your preparation.

Procedure

  1. Select the column that is common to both your main preparation and your reference dataset, state_code in this example.
    In order to perform a lookup, at least one column with common data must be present in the preparation and dataset that you want to blend.
  2. Click the lookup button in the upper right part of the screen to open the lookup panel.

    The Add data from lookup panel opens at the bottom of the screen.

  3. Click the button.
  4. In the dialog box that opens, select the dataset you want to use to perform the lookup and click Add.
    In this case, you need to add the dataset containing the list of two-letter State codes and their matching names. It must have been imported to Talend Data Preparation beforhand.

    The second dataset opens in the bottom part of the screen.

  5. In both your preparation and reference dataset, click the column containing the State codes so that they are highlighted.
  6. Select the Add to Dataset check box under every column you want to include in your lookup, State in this example.
  7. Point your mouse over the Confirm button to preview the changes, and click it to apply them.
    The State name for each matching State code between main preparation and reference dataset was retrieved.

    The only exception that you can see here is the second row, with the DC State code, that could not be matched with a full State name. The reason is that DC is not actually a proper State code, hence the invalid status in the first place. Logically, DC is not referenced in the second dataset, that only contains official values, and no match was found for this value in your preparation.

    The lookup feature behaves as follows:

    • If a value is common to both your main preparation and the reference dataset, the additional information is retrieved.
    • If a value exists in your preparation but not in the reference dataset, the lookup operation will leave an empty cell in the resulting column.
    • If a value from the reference dataset does not find a match in the main preparation, the additional information will simply not be retrieved.

    The logic behind the lookup operation in Talend Data Preparation corresponds to a left outer join when speaking in terms of Venn diagrams: