List of functions - Cloud

Talend Cloud Data Preparation User Guide

author
Talend Documentation Team
EnrichVersion
Cloud
EnrichProdName
Talend Cloud
task
Data Quality and Preparation > Cleansing data
EnrichPlatform
Talend Data Preparation

This table lists all the functions available in Talend Cloud Data Preparation and their effects.

Applying any of the functions listed below, with the exception of Math functions, will return the input value if the data is considered invalid or if the data type is incompatible with the selected function, no matter if the Create new column check box is selected or not.

In the case of Math functions, empty values are ignored, while invalid values will lead to an empty result when the function is applied.

Note:

The following functions, that rely on matching algorithms, do not support Asian characters:

  • Standardize value (fuzzy matching)
  • Find and group similar text

Functions that can be applied on rows

Name Category Description
Delete row data cleansing Delete this row
Make as header data cleansing Cells of this row will become columns names, the row will be deleted

Functions that can be applied on columns

Name Category Description
Negate value boolean Reverse the boolean value of cells from this column
Change data type column metadata Change type of this column (number, text, date, etc.)
Change semantic domain column metadata Change semantic domain of this column (city, zipcode, last name, etc.)
Create new column column metadata Copy a column or create a brand new one
Delete column column metadata Delete the selected columns
Duplicate column column metadata Create an exact copy of this column
Rename column column metadata Rename this column
Concatenate columns columns Merge the content of two or more columns into a new one.
Concatenate with columns  
Reorder columns columns Change column order
Swap columns columns Swap the values with an other column
Convert country names and codes conversions Convert a country name to an ISO-3166 country code (alpha-2, alpha-3 or numeric) or vice versa. This function supports English, French and Japanese names and codes.Merge the content of this column with another one, and displays it in a new column
Convert distance conversions Convert distance from one unit to another
Convert duration conversions Convert duration from one unit to another
Convert temperature conversions Convert temperature measurement units
Clear on matching value data cleansing Clear cells that match the value
Clear the cells with invalid values data cleansing Clear cells that contain a value recognized as invalid
Coalesce columns data cleansing Merge the content of this column with another one, and displays it in a newRetrieve the first not null value for each row among the selected columns
Delete empty rows data cleansing Remove rows where the cell in each column is empty
Delete the rows that match data cleansing Delete rows where a cell in this column has a specific value
Delete the rows with empty cell data cleansing Delete rows that have empty cells
Delete the rows with invalid cell data cleansing Delete rows which contain an invalid cell
Delete the rows with negative values data cleansing rows with a negative value in this column will be deleted
Fill cells with value data cleansing Fill cells from this column with a given value
Fill empty cells from above data cleansing Copy the value from the nearest non-empty cell above. When there is a series of empty cells, they all fill with the same value that was originally picked from above.
Fill empty cells with text data cleansing Fill empty cells from this column with a given value
Fill empty cells with value data cleansing Fill cells from this column with a given value
Standardize value (fuzzy matching) data cleansing Replaces invalid values with the closest correct value from a dictionary
Note: This function does not support Asian characters.
Hash data data masking Hash the content of a column using the SHA-256 algorithm.
Mask data (obfuscation) data masking Mask data according to the domain information of the column (anonymisation)
Lookup data blending Blends columns from another dataset into this one
Calculate time until dates Calculate elapsed time since a date in the desired unit (year, month, day, hour, minute, second)
Calculate timestamp to date dates Given a timestamp (ellapsed time in second since Unix time, that is to say 01/01/1970), create a new column with the date
Change date format dates Change the date format to use in a date column
Compare dates dates Compare this column to another column or a constant
Convert dates dates Convert dates from one calendar to another
Extract date parts dates Create columns with year, month, quarter, day, hour, minute, second, etc.
Modify Dates dates Add or substract time unit amount
Deduplicate rows with identical values deduplication Delete rows that are partly or entirely duplicate and keep the first one of the columns.
Delete these filtered rows filtered Delete only the rows that match the current filters
Keep these filtered rows filtered Keep only the rows that match the current filters
Add, multiply, substract or divide math Perform an operation/calculation on this column with another one or with a fixed value: Add/sum (+), multiply (x), subtract (-), or divide(/)
Base 10 Logarithm math Compute the base 10 logarithm from a column
Calculate absolute value math Calculate the absolute value for all the numeric values in this column.
Cosine math Compute the trigonometric Cosine from a column
Exponential math Exponential of a column number
Max math Max with another column or a constant
Max of columns math Display the max value between the records of two or more columns
Mean of columns math Calculate the mean value of the records of two or more columns
Median of columns math Calculate the median value of the records of two or more columns
Min math Min with another column or a constant
Min of columns math Display the min value between the records of two or more columns
Modulo math Calculate the remainder of a division operation
Multiply columns math Multiply the values of two or more columns and display the result in a new column
Natural logarithm math Compute the natural logarithm from a column
Negate math Negate a column number
Power math Power with another column or a constant
Sine math Compute the trigonometric Sine from a column
Square root math Square root of a column number
Sum columns math Caculate the sum of the records of two or more columns and display the result in a new column
Tangent math Compute the trigonometric Tangent from a column
Compare numbers numbers Compare this column to another column or a constant
Format numbers numbers Allow to format number (decimal, integer & scientific) in a specific format or pattern
Generate sequence numbers Creates a sequence of positive or negative integers in a column, with the step of your choice
Remove fractional part numbers Round towards zero. (3.74 -> 3) and (-3.74 -> -3)
Remove negative values numbers Replaces negative integers or decimal values with an empty values.
Round value using ceil mode numbers Round up value to the nearest number, depending on the precision you set. (3.14 -> 4 if Precision is set to 0, and 3.14 -> 3.2 if Precision is set to 1)
Round value using down mode numbers Round towards zero. (3.74 -> 3 and -3.74 -> -3 for a Precision set to 0)
Round value using floor mode numbers Round down value to the nearest number, depending on the precision you set. (3.74 -> 3 if Precision is set to 0, and 3.74 -> 3.7 if Precision is set to 1)
Round value using halfUp mode numbers Round value to the closest number, depending on the precision you set. (3.14 -> 3 and 3.74 -> 4 for a Precision set to 0)
Extract phone number information phones Extract additional information from phone numbers, such as phone type, country or carrier name. Each field is extracted in a new column.
Format phone number phones Format a phone number to standard formats
Extract email parts split Extract local and domain parts from an email
Extract full name parts split Extract information from a full name, including the title, first name, nickname, middle name, last name and suffix.
Extract number split Extract number from the input
Extract values by regex split Extract string tokens based on regex groups
Extract values by semantic type split Extract different information in seperated columns according to predefined or custom semantic types.
Extract URL parts split Extract protocol, host, port, query, etc... from an URL in separated columns
Split the text in parts split Split column from separators
Calculate length strings Extract the number of digits from a value (23562 -> 5)
Change to lower case strings Converts all of the cell text in this column to lower case
Change to title case strings Converts the text content from this column to title case (i.e. "data prep" -> "Data Prep")
Change to upper case strings Converts all of the cell text in this column to UPPER case (capitalize)
Contains text strings Checks if the cell contains the specified value
Extract a value by index strings Extract part of the text (substring) to a new column
Magic fill strings Allows you to define a pattern via examples and apply the transformation to the whole column
Match similar text strings Create a new column with <i>true</i> or <i>false</i> regarding if the value is less or equals the Levenshtein distance of a given value
Matches pattern strings Create a new column with <i>true</i> or <i>false</i> regarding if the value that matches or not a given pattern
Remove consecutive characters strings Remove consecutive repeated characters
Remove part of the text strings Remove specified text from cells in this column
Remove trailing and leading characters strings Remove trailing and leading spaces or other specified characters (i.e. trim)
Search and replace strings Replace cells or parts of cells which contain a specific value
Add extra characters strings advanced Add extra characters (padding) on the left or on the right of the original value to match an expected size
Convert character width strings advanced Convert the character width to half or full width, or normalize strings.
Find and group similar text strings advanced Replace all similar values with the right one (i.e. cluster on fuzzy matching)
Note: This function does not support Asian characters.
Generate unique identifier (UUID) strings advanced Generates a unique identifier for each row of the column.
Remove all non alpha numeric characters strings advanced Cleans values that contain unexpected characters by removing all characters that are not 0-9, Aa-Zz, and accented letters. Whitespaces are kept.

For example, T=+Âl**€en#d$ 20%1,7 will become TÂlend 2017.

Remove all non numeric characters strings advanced Cleans numeric values by removing all characters that are not 0-9. Decimal separators and whitespaces are kept.

For example, T=+Âl**€en#d$ 20%1,7 will become 201,7.

Simplify text (remove case, accent, etc.) strings advanced Simplifies the content of the column by putting it in lower case, removing accents, normalizing using NFKD, splitting combined characters and trimming the whole string.

For example, François becomes francois.

Functions that can be applied on the whole table

Name Category Description
Delete empty columns data cleansing Delete all columns that contains only empty values
Delete empty rows data cleansing Remove rows where the cell in each column is empty
Change date format dates Change the date format to use in a data column
Remove duplicate rows deduplication Delete all rows that are exact duplicate and keep only one
Format numbers numbers Allow to format number (decimal, integer & scientific) in a specific format or pattern
Format phone number phones Format a phone number to standard formats
Remove trailing and leading characters strings Remove trailing and leading spaces or other specified characters (i.e. trim)
Search and replace strings Replace cells or parts of cells which contain a specific value