Blending data - Cloud

Talend Cloud Data Preparation Getting Started Guide

Version
Cloud
Language
English
Product
Talend Cloud
Module
Talend Data Preparation
Content
Data Quality and Preparation > Cleansing data

The lookup feature allows you to take data from an existing dataset and add it to your preparation.

This example assumes that:

  • You have retrieved the states.csv file from the Downloads tab of the documentation page.
  • You have added states.csv to your list of datasets in Talend Cloud Data Preparation. For more information about how to import a dataset, see Opening a dataset from a local file.

In this example, you want to add more geographical information on your customers, thanks to a reference file that you possess: the States dataset. This dataset contains the list of the US state codes, and their corresponding region. You will dynamically use the data from this dataset to complement your preparation. This will allow you to add information about each customer's subscription region, based on their state code.

To blend the data from another dataset in your preparation, proceed as follows:

Procedure

  1. Open your preparation.
  2. Click the lookup button in the upper right part of the screen to open the lookup panel.

  3. Click Select dataset to select an existing dataset.
  4. Select the dataset you want to use to perform the lookup, the states dataset in this example.
  5. Click Select.
  6. From the Current preparation and Lookup dataset drop-down list, select the columns matching in your main preparation and your reference dataset, the State and States Code columns in this example.

    In order to perform a lookup, at least one column with matching data must be present in the preparation and dataset that you want to blend, the US state codes in this case.

  7. From the Columns to add drop-down list, select the Region column to add it to the current dataset.

  8. Choose to apply those changes only on the filtered rows or on all of the rows.
  9. Click Submit to apply the changes and add the Region column to your preparation.

Results

Your data now includes a new information about the subscription region of your customer, that you extracted from a reference file.