Using the email domains from another dataset - 8.0

Talend Data Preparation Examples

Version
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Data Preparation
Content
Data Quality and Preparation > Cleansing data
Last publication date
2024-01-12

The lookup feature matches data from the current dataset with its counterpart in a reference dataset.

On the one hand, you have the marketing_leads dataset, that you are currently working on, that contains information about the company where the listed customers are working. On the other hand, the emails_reference contains a list of companies, and the email domain that they are using.

You are going to do a lookup on the emails_reference dataset, and extract the information about email domains to match them with the companies from the marketing_leads dataset.

Before you begin

To perform the lookup on the emails_reference, you have to import it by using the Add dataset button in the Datasets view of the homepage.

Procedure

  1. Select the column on which you want to perform the lookup, the company column in this example.
    This is the column that can be found in the source dataset, as well as the reference dataset. There must always be a common column between two datasets to perform a lookup.
  2. Click the lookup button to open the lookup panel.
  3. Click the button and, in the dialog box that opens, select the dataset you want to use to perform the lookup, the emails_reference dataset in this example.
  4. Click Add.
  5. In the lookup window that opens in the bottom half of your screen, click the company_name column.
  6. Select the Add to Dataset check box.
  7. Point your mouse over the Confirm button to preview the changes.
  8. Click the Confirm button to apply those changes.

Results

The email_domain column is added to the marketing_leads dataset, next to the company column.

This information about email domains will be added to the first names and last names from the duplicated column to create the complete email addresses.