Blending data - Cloud

Talend Cloud Data Preparation Getting Started Guide

Version
Cloud
Language
English
Product
Talend Cloud
Module
Talend Data Preparation
Content
Data Quality and Preparation > Cleansing data
Last publication date
2024-03-05

The lookup feature allows you to take data from an existing dataset and add it to your preparation.

This example assumes that:

  • You have downloaded and extracted the states.zip file.
  • You have added states.csv to your list of datasets in Talend Cloud Data Preparation. For more information about how to import a dataset, see Opening a dataset from a local file.

In this example, you want to add more geographical information on your customers, thanks to a reference file that you possess: the States dataset. This dataset contains the list of the US state codes, and their corresponding region. You will dynamically use the data from this dataset to complement your preparation. This will allow you to add information about each customer's subscription region, based on their state code.

To blend the data from another dataset in your preparation, proceed as follows:

Procedure

  1. Open your preparation.
  2. Click the lookup button in the upper right part of the screen to open the lookup panel.

    Lookup button.

  3. Click Select dataset to select an existing dataset.
    Select dataset button.
  4. Select the dataset you want to use to perform the lookup, the states dataset in this example.
    The states dataset is selected.
  5. Click Select.
  6. From the Current preparation and Lookup dataset drop-down list, select the columns matching in your main preparation and your reference dataset, the State and State columns in this example.
    Select matching column step with the State columns selected.

    In order to perform a lookup, at least one column with matching data must be present in the preparation and dataset that you want to blend, the US state codes in this case.

    In the preparation and in the dataset, the State columns are matching.
  7. From the Columns to add drop-down list, select the Region column to add it to the current dataset.

    In the Import columns step, the Region column is selected.

  8. Choose to apply those changes only on the filtered rows or on all rows.
  9. Click Submit to apply the changes and add the Region column to your preparation.

Results

Your data now includes a new information about the subscription region of your customer, which you extracted from a reference file.
The State column with new information about the subscription region of the customer.