Using the email domains from another dataset - Cloud

Talend Cloud Data Preparation Quick Examples

Version
Cloud
Language
English (United States)
Product
Talend Cloud
Module
Talend Data Preparation
Content
Data Quality and Preparation > Cleansing data

The lookup feature matches data from the current dataset with its counterpart in a reference dataset.

On the one hand, you have the marketing_leads dataset, that you are currently working on, that contains information about the company where the listed customers are working. On the other hand, the emails_reference contains a list of companies, and the email domain that they are using.

You are going to do a lookup on the emails_reference dataset, and extract the information about email domains to match them with the companies from the marketing_leads dataset.

Before you begin

To perform the lookup on the emails_reference, you have to import it by using the Add dataset button in the Datasets view of the homepage.

Procedure

  1. Open the marketing_leads preparation.
  2. Click the lookup button in the upper right part of the screen to open the lookup panel.

  3. Click Select dataset to select an existing dataset.
  4. Select the dataset you want to use to perform the lookup, the emails_reference dataset in this example.
  5. From the Current preparation and Lookup dataset drop-down list, select the columns matching in your main preparation and your reference dataset, the company and company_name columns in this example.
    In order to perform a lookup, at least one column with matching data must be present in the preparation and dataset that you want to blend.
  6. From the Columns to add drop-down list, select the column containing the email addresses to add it to the current dataset, the email_domain column in this example.
  7. Choose to apply those changes only on the filtered rows or on all of the rows.
  8. Click Submit to apply the changes.

Results

The email_domain column is added to the marketing_leads dataset, next to the company column.

This information about email domains will be added to the first names and last names from the duplicated column to create the complete email addresses.