Extracting phone number information - Cloud

Talend Cloud Data Preparation User Guide

author
Talend Documentation Team
EnrichVersion
Cloud
EnrichProdName
Talend Cloud
task
Data Quality and Preparation > Cleansing data
EnrichPlatform
Talend Data Preparation

You can use the Extract phone number information function to extract new types of information about phone numbers into several new columns.

This function is able to extract information about the phone type, the country, the region, the geographic area, the carrier name and the timezone. However, the behaviour of the function depends on the semantic type of the column containing the phone number data:

  • If the semantic type corresponds to either US Phone, UK Phone, DE Phone or FR Phone, you can simply select which fields you want to output and apply the function.
  • If the column contains numbers from different countries, with different formats, and the matching semantic type is the more generic Phone number, you will need to do some formatting before being able to use the Extract phone number information function. This is a necessary step because numbers that are not standardized often have a structure that corresponds to several country, making it impossible to uniquely determine the country.

Let's take the example of a dataset containing basic customer information, such as names countries and phone numbers from clients all over the world. Your goal with this preparation is to work on phone numbers to only keep customers who gave their mobile phone number as contact information. The Extract phone number information could display this information about the phone type, but because the numbers are in various formats, you cannot apply the function just yet. You are first going to perform a formatting operation on the phone column, using the information of the country column, to add an international prefix to your numbers. Talend Data Preparation will then be able to extract the information of your phone numbers, that are in an harmonized format, and that also contain an information about their respective countries.

Procedure

  1. Click the header of the phone column to select its content.
  2. In the functions panel, select the Format phone numbers function, apply it using the information from the country column and set the output to the International format.

    The phone number are now in a single format, with the international code as prefix. It is now possible to uniquely identify the country from the phone number and extract the additional information.

    For more information on how to use the Format phone numbers function with another column, see Formatting phone numbers.

  3. In the functions panel, type Extract phone number information and click the result to open the options for the associated function.
  4. Click the check box corresponding to the different categories of information that you want to extract.

    Each category is exported to a new column. For this example, leave the Phone number region check box unselected because the dataset already contains information about the region, in the form of country codes.

  5. In the Language drop-down list, select the language in which you want the information to be output, English in this example.
  6. Click Submit.

Results

After a quick formatting step, the columns containing the various information extracted from the phone numbers, have been created. The information is extracted by the Google phone library. You can now easily identify which numbers are from a fixed line or from a mobile and continue your preparation.

Rows that were empty or invalid, will generate empty cells after the function has been applied.