List of functions - 7.3

Talend Data Preparation User Guide

Version
7.3
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Data Preparation
Content
Data Quality and Preparation > Cleansing data
Last publication date
2023-11-28

These tables list all the functions available in Talend Data Preparation and their effects.

Applying any of the functions listed below, with the exception of Math functions, will return the input value if the data is considered invalid or if the data type is incompatible with the selected function, no matter if the Create new column check box is selected or not.

In the case of Math functions, empty values are ignored, while invalid values will lead to an empty result when the function is applied.

Note:

The following functions, that rely on matching algorithms, do not support Asian characters:

  • Standardize value (fuzzy matching)
  • Find and group similar text

Functions that can be applied on rows

Name Category Description
Delete row data cleansing Deletes this row
Make as header data cleansing Cells of this row will become columns names, the row will be deleted

Functions that can be applied on columns

Name Category Description
Negate value boolean Reverse the boolean value of cells from this column
Change data type column metadata Change type of this column (number, text, date, etc.)
Change semantic domain column metadata Change semantic domain of this column (city, zipcode, last name, etc.)
Create new column column metadata Copy a column or create a brand new one
Note: When used in Sequence mode, this function is not compatible with Spark Jobs, and HDFS or S3 exports.
Delete column column metadata Delete the selected columns
Duplicate column column metadata Create an exact copy of this column
Rename column column metadata Rename this column
Concatenate columns columns Merge the content of two or more columns into a new one
Concatenate with columns Merge the content of this column with another one, and displays it in a new column
Reorder columns columns Change column order
Swap columns columns Swap the values with an other column
Convert country names and codes conversions Convert a country name to an ISO-3166 country code (alpha-2, alpha-3 or numeric) or vice versa. This function supports English, French and Japanese names and codes.
Convert distance conversions Convert distance from one unit to another
Convert duration conversions Converts duration from one unit to another
Convert temperature conversions Converts temperature measurement units
Clear on matching value data cleansing Clears cells that match the value
Clear the cells with invalid values data cleansing Clears cells that contain a value recognized as invalid
Coalesce columns data cleansing Retrieves the first not null value for each row among the selected columns
Delete empty rows data cleansing Removes rows where the cell in each column is empty
Delete the rows that match data cleansing Deletes rows where a cell in this column has a specific value
Delete the rows with empty cell data cleansing Deletes rows that have empty cells
Delete the rows with invalid cell data cleansing Deletes rows which contain an invalid cell
Delete the rows with negative values data cleansing Rows with a negative value in this column will be deleted
Fill cells with value data cleansing Fills cells from this column with a given value
Fill empty cells from above data cleansing Copies the value from the nearest non-empty cell above. When there is a series of empty cells, they all fill with the same value that was originally picked from above.
Note: This function is not compatible with Spark Jobs, and HDFS or S3 exports.
Fill empty cells with text data cleansing Fills empty cells from this column with a given value
Fill empty cells with value data cleansing Fills cells from this column with a given value
Standardize value (fuzzy matching) data cleansing Replaces invalid values with the closest correct value from a dictionary
Note: This function does not support Asian characters.
Hash data data masking Hashes the content of a column using the SHA-256 algorithm.
Mask data (obfuscation) data masking Masks data according to the domain information of the column (anonymisation)
Lookup data blending Blends columns from another dataset into this one
Calculate time since dates Calculates elapsed time since a date in the desired unit (year, month, day, hour, minute, second)
Calculate timestamp to date dates Given a timestamp (ellapsed time in second since Unix time, that is to say 01/01/1970), create a new column with the date
Change date format dates Changes the date format to use in a date column
Compare dates dates Compares this column to another column or a constant
Convert dates dates Converts dates from one calendar to another
Extract date parts dates Creates columns with year, month, quarter, day, hour, minute, second, etc.
Modify dates dates Adds or substract time unit amount
Deduplicate rows with identical values deduplication Deletes rows that are partly or entirely duplicate and keep the first one of the columns
Note: This function is not compatible with Spark Jobs, and HDFS or S3 exports.
Delete these filtered rows filtered Deletes only the rows that match the current filters
Keep these filtered rows filtered Keeps only the rows that match the current filters
Add, multiply, substract or divide math Performs an operation/calculation on this column with another one or with a fixed value: Add/sum (+), multiply (x), subtract (-), or divide(/)
Base 10 Logarithm math Computes the base 10 logarithm from a column
Calculate absolute value math Calculates the absolute value for all the numeric values in this column.
Cosine math Computes the trigonometric Cosine from a column
Exponential math Exponential of a column number
Max math Max with another column or a constant
Max of columns math Displays the max value between the records of two or more columns
Mean of columns math Calculates the mean value of the records of two or more columns
Median of columns math Calculates the median value of the records of two or more columns
Min math Min with another column or a constant
Min of columns math Displays the min value between the records of two or more columns
Modulo math Calculates the remainder of a division operation
Multiply columns math Multiplies the values of two or more columns and display the result in a new column
Natural logarithm math Computes the natural logarithm from a column
Negate math Negates a column number
Power math Powers with another column or a constant
Sine math Computes the trigonometric Sine from a column
Square root math Squares the root of a column number
Sum columns math Caculates the sum of the records of two or more columns and display the result in a new column
Tangent math Computes the trigonometric Tangent from a column
Compare numbers numbers Compares this column to another column or a constant
Format numbers numbers Allows to format number (decimal, integer & scientific) in a specific format or pattern
Generate sequence numbers Creates a sequence of positive or negative integers in a column, with the step of your choice
Note: This function is not compatible with Spark Jobs, and HDFS or S3 exports.
Remove fractional part numbers Rounds towards zero. (3.74 -> 3) and (-3.74 -> -3)
Remove negative values numbers Replaces negative integers or decimal values with an empty values.
Round value using ceil mode numbers Rounds up value to the nearest number, depending on the precision you set. (3.14 -> 4 if Precision is set to 0, and 3.14 -> 3.2 if Precision is set to 1)
Round value using down mode numbers Rounds towards zero. (3.74 -> 3 and -3.74 -> -3 for a Precision set to 0)
Round value using floor mode numbers Rounds down value to the nearest number, depending on the precision you set. (3.74 -> 3 if Precision is set to 0, and 3.74 -> 3.7 if Precision is set to 1)
Round value using halfUp mode numbers Rounds value to the closest number, depending on the precision you set. (3.14 -> 3 and 3.74 -> 4 for a Precision set to 0)
Extract phone number information phones Extracts additional information from phone numbers, such as phone type, country or carrier name. Each field is extracted in a new column.
Format phone number phones Formats a phone number to standard formats
Extract email parts split Extracts local and domain parts from an email
Extract full name parts split Extracts information from a full name, including the title, first name, nickname, middle name, last name and suffix
Extract number split Extracts number from the input
Extract values by regex split Extracts string tokens based on regex groups
Extract values by semantic type split Extracts different information in separated columns according to predefined or custom semantic types
Extract URL parts split Extracts protocol, host, port, query, etc... from an URL in separated columns
Split the text in parts split Splits column from separators
Calculate length strings Extracts the number of digits from a value (23562 -> 5)
Change to lower case strings Converts all of the cell text in this column to lower case
Change to title case strings Converts the text content from this column to title case (i.e. "this is an example" -> "This Is an Example")
Change to upper case strings Converts all of the cell text in this column to UPPER case (capitalize)
Contains text strings Checks if the cell contains the specified value
Extract a value by index strings Extracts part of the text (substring) to a new column
Magic fill strings Allows you to define a pattern via examples and apply the transformation to the whole column
Note: This function is not compatible with Spark Jobs, and HDFS or S3 exports.
Match similar text strings Creates a new column with true or false depending on whether the value is less than or equals the Levenshtein distance of a given value
Matches pattern strings Creates a new column with true or false depending on whether the value matches a given pattern
Remove consecutive characters strings Removes consecutive repeated characters
Remove part of the text strings Removes specified text from cells in this column
Remove trailing and leading characters strings Removes trailing and leading spaces or other specified characters (i.e. trim)
Search and replace strings Replaces cells or parts of cells which contain a specific value
Add extra characters strings advanced Adds extra characters (padding) on the left or on the right of the original value to match an expected size
Convert character width strings advanced Converts the character width to half or full width, or normalize strings
Find and group similar text strings advanced Replaces all similar values with the right one (i.e. cluster on fuzzy matching)
Note: This function does not support Asian characters.
Generate unique identifier (UUID) strings advanced Generates a unique identifier for each row of the column
Remove all non alpha numeric characters strings advanced Cleans values that contain unexpected characters by removing all characters that are not 0-9, Aa-Zz, and accented letters. Whitespaces are kept.

For example, T=+Âl**€en#d$ 20%1,7 will become TÂlend 2017.

Remove all non numeric characters strings advanced Cleans numeric values by removing all characters that are not 0-9. Decimal separators and whitespaces are kept.

For example, T=+Âl**€en#d$ 20%1,7 will become 201,7.

Simplify text (remove case, accent, etc.) strings advanced Simplifies the content of the column by putting it in lower case, removing accents, normalizing using NFKD, splitting combined characters and trimming the whole string.

For example, François becomes francois.

Functions that can be applied on the whole table

Name Category Description
Delete empty columns data cleansing Deletes all columns that contains only empty values
Delete empty rows data cleansing Removes rows where the cell in each column is empty
Change date format dates Changes the date format to use in a data column
Remove duplicate rows deduplication Deletes all rows that are exact duplicate and keep only one
Note: This function is not compatible with Spark Jobs, and HDFS or S3 exports.
Format numbers numbers Allows to format number (decimal, integer & scientific) in a specific format or pattern
Format phone number phones Formats a phone number to standard formats
Remove trailing and leading characters strings Removes trailing and leading spaces or other specified characters (i.e. trim)
Search and replace strings Replaces cells or parts of cells which contain a specific value