Leveraging semantic awareness to transform state codes into states using Magic Fill - 7.3

Talend Data Preparation User Guide

author
Talend Documentation Team
EnrichVersion
7.3
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
task
Data Quality and Preparation > Cleansing data
EnrichPlatform
Talend Data Preparation

Based on the example that you will input, the Magic Fill function also has the ability to recognize and format data that corresponds to certain semantic types: countries, emails, URLs, dates, as well as US postal codes and states.

Let's take the example of a dataset containing some customer information, like their names, email addresses, or the state they live in. This scenario will illustrate how you can use the Magic Fill function to convert the two-letter state codes to their full state name equivalent, to have them in a format that is easier to read.

Procedure

  1. Click the header of the state column in order to select its content.
  2. In the functions panel, type Magic fill and click the result to display the options of the associated function.
  3. In the Input 1 field, enter one of the values from the state column that you would like to transform, NY for example.
  4. In the Output 1 field, enter the full name of the corresponding state: New-York.
    For the function to work, you need to enter at least two complete examples of the transformation you want to apply. You can then add up to three other examples. Examples can be taken from your dataset, or you can use other valid US state code values from outside your dataset. The more examples you input, the more accurately the transformation will be identified by the function.
  5. Enter more before and after examples in the remaining fields:
    • AZ as Input 2 and Arizona as Output 2
    • TX as Input 3 and Texas as Output 3
    • IN as Input 4 and Indiana as Output 4
    • MI as Input 5 and Michigan as Output 5

    Based on these examples, the function will understand that the all the inputs you entered are two-letter state codes, and the outputs are full state names, and will apply the corresponding transformation to the rest of the column.

  6. Click Submit.

Results

A new column is created, where the formatting operation defined by your examples has been applied to the rest of the state codes. You can see from the column header that the new values all correspond to the US State semantic type, ensuring that they are in the good format. You can now delete the state column.