Automatically formatting data based on examples - 7.3

Talend Data Preparation User Guide

author
Talend Documentation Team
EnrichVersion
7.3
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
task
Data Quality and Preparation > Cleansing data
EnrichPlatform
Talend Data Preparation

The Magic Fill function offers a convenient solution to format data types that do not have a dedicated function or to easily perform a succession of transformations with the same function.

Note: This function is not compatible with Spark Jobs, and HDFS or S3 exports.

Via a machine learning algorithm, this function allows you to define a pattern, and automatically apply a transformation on a whole column, based on a few examples that you define beforehand.

At the moment, the Magic Fill function supports the following transformation types:

  • substring
  • addition of constants (numbers, letters, special characters)
  • case sensitivity
  • semantic transformation for countries, US postal codes and states, emails, URLs and dates

For the function to work, you need to enter at least two examples of the transformation you want to apply. You can then add up to three other examples. The more examples you input, the more accurately the pattern will be identified by the function.

If the transformation program generated by the function doesn't apply to some of the data from the source column, it will remain unchanged in the target column.

Data types such as dates or phone numbers both have dedicated function that can be used to easily change their format. However full names, social security numbers or state codes, for example, do not. The following scenarios will illustrate how to use the Magic Fill function to format your data in those cases.