Retrieving the first non empty value of several columns - 7.3

Talend Data Preparation User Guide

Version
7.3
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Data Preparation
Content
Data Quality and Preparation > Cleansing data
Last publication date
2023-11-28

With the Coalesce columns function, you can easily retrieve the first non null value across different columns to consolidate their data into a new column.

In this example, you are working on a dataset that contains information about your customers, including the different phone numbers at which you can reach them: office, mobile or home number. You can notice that some values are empty. Because you do not necessarily need to have 3 columns with similar information in your dataset, you can choose to create a new one containing only one phone number for each row, and make sure that the empty values are replaced with usable information at the same time. Here, you want to prioritize the mobile phone numbers, and retrieve the office or home number as backup.

Procedure

  1. To prioritize the mobile phone number over the other numbers, drag the phone_mobile column header and drop it left of the phone_office column.
    The columns are now ordered as following:
  2. Use Ctrl + click or Shift + click to select the headers of the three columns containing phone numbers.
  3. In the Functions panel, type Coalesce columns and click the result to display the options of the associated function.
  4. In the Direction drop-down list, select Left to right.
    With the column order that you have set up, the function will start by reading the phone_mobile column, and will replace any empty value by the next non empty on its right.
  5. Leave the default setting for the Values containing only non-printing characters drop-down list, namely Consider empty.
  6. Click Submit.

Results

A new consolidated column is created, where the missing mobile phone numbers are replaced by the values of the next columns. You can now delete the three source columns to only keep the new one, and rename it with a meaningful name if you want.