Automatically formatting data based on examples - Cloud

Talend Cloud Data Preparation User Guide

Talend Documentation Team
Talend Cloud
Administration and Monitoring > Managing connections
Data Quality and Preparation > Cleansing data
Data Quality and Preparation > Managing datasets
Talend Data Preparation

The Magic Fill function offers a convenient solution to format data types that do not have a dedicated function or to easily perform a succession of transformations with the same function.

Note: This function is not compatible with Spark Jobs, and HDFS or S3 exports.

Via a machine learning algorithm, this function allows you to define a pattern, and automatically apply a transformation on a whole column, based on a few examples that you define beforehand.

At the moment, the Magic Fill function supports the following transformation types:

  • substring
  • addition of constants (numbers, letters, special characters)
  • case sensitivity
  • semantic transformation for countries, US postal codes and states, emails, URLs and dates

For the function to work, you need to enter at least two examples of the transformation you want to apply. You can then add up to three other examples. The more examples you input, the more accurately the pattern will be identified by the function.

If the transformation program generated by the function doesn't apply to some of the data from the source column, it will remain unchanged in the target column.

Data types such as dates or phone numbers both have dedicated function that can be used to easily change their format. However full names, social security numbers or state codes, for example, do not. The following scenarios will illustrate how to use the Magic Fill function to format your data in those cases.