tSurviveFields Standard properties

Deduplication

author
Talend Documentation Team
EnrichVersion
6.5
EnrichProdName
Talend Data Services Platform
Talend ESB
Talend Open Studio for Big Data
Talend Big Data
Talend Open Studio for ESB
Talend Big Data Platform
Talend Real-Time Big Data Platform
Talend Open Studio for Data Integration
Talend Open Studio for MDM
Talend Data Management Platform
Talend Data Integration
Talend MDM Platform
Talend Data Fabric
task
Data Quality and Preparation > Third-party systems > Data Quality components > Deduplication components
Design and Development > Third-party systems > Data Quality components > Deduplication components
Data Governance > Third-party systems > Data Quality components > Deduplication components
EnrichPlatform
Talend Studio

These properties are used to configure tSurviveFields running in the Standard Job framework.

The Standard tSurviveFields component belongs to the Data Quality, the Talend MDM and the Processing families.

This component is available in all Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.

Basic settings

Schema and Edit schema

A schema is a row description, it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository.

 

Built-in: The schema will be created and stored locally for this component only. Related topic: see Talend Studio User Guide.

 

Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: see Talend Studio User Guide.

Key

Define the merge sets, the values of which will be used for calculations.

Output column: Select the column name from the list that reflects the schema structure you defined. You can add as many output columns as you wish to make more precise aggregations.

Input column: Match each input column name with your output columns, in case the output column of the aggregation set needs to be different.

Warning:

The columns in the Key table must NOT appear in the Operations table. If you want all the columns of the output schema to be filled in, they must appear either in the Key table or in the Operations table.

Operations

Output column: From the list, select the output column which will result from the selected merge operation.

Function: Select the type of merge operation to be performed from the list. The list includes count, min, max, avg, sum, first, last, list, list(object), count(distinct), standard deviation, max length and best rank.

Input column: From the list, select the input column from which the values are to be selected for the merge operation.

Rank column: Only available with the best rank function. From the list, select the column you want to use as a rank value for the merge operation. Then the input column will be replaced with the value which has the greater rank.

Ignore null values: Select the check boxes which correspond to the names of the columns for which you want the NULL value to be ignored.

Advanced settings

Delimiter (only for list operation)

Between double quotation marks, enter the delimiter you want to use for the list operation.

Use financial precision, this is the max precision for "sum" and "avg" operations, checked option heaps more memory and slower than unchecked.

This check box, which enables financial precision, is selected by default. Clear the check box if you want to use less memory and thus optimize performance.

Check type overflow (slower)

Checks the data type to ensure that the job does not crash.

If you select this check box, the system will be slower.

Check ULP (Unit in the Last Place), ensure that a value will be incremented or decremented correctly, only for float and double types. (slower)

Select this check box to launch ULP verification.

If you select this check box, the system will be slower.

tStatCatcher Statistics

Select this check box to collect log data at the Job and the component levels.

Global Variables

Global Variables

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

Usage rule

This component requires an input component and an output component.