Skip to main content

Processing (Integration) components

tAggregateRow Receives a flow and aggregates it based on one or more columns.
tAggregateSortedRow Aggregates the sorted input data for output column based on a set of operations. Each output column is configured with many rows as required, the operations to be carried out and the input column from which the data will be taken for better data aggregation.
tCacheIn Offers faster access to the persistent data.
tCacheOut Persists the input RDDs depending on the specific storage level you define in order to offer faster access to these datasets later.
tConvertType Converts one Talend java type to another automatically, and thus avoid compiling errors.
tDenormalize Denormalizes the input flow based on one column.
tDenormalizeSortedRow Synthesizes sorted input flow to save memory.
tExternalSortRow Sorts input data based on one or several columns, by type (numeric or alphabetical) and order (ascendant or descendent), using an external sort application.
tExtractDelimitedFields Generates multiple columns from a delimited string column.
tExtractDynamicFields Parses a Dynamic column to create standard output columns.
tExtractEDIField Reads the EDI structured data from an EDIFACT message file, generates an XML according to the EDIFACT family and the EDIFACT type, extracts data by parsing the generated XML using the XPath queries manually defined or coming from the Repository wizard, and finally sends the data to the next component via a Row connection.
tExtractJSONFields Extracts the desired data from JSON fields based on the JSONPath or XPath query.
tExtractPositionalFields Extracts data and generates multiple columns from a formatted string using positional fields.
tExtractRegexFields Extracts data and generates multiple columns from a formatted string using regex matching.
tExtractXMLField Reads the XML structured data from an XML field and sends the data as defined in the schema to the following component.
tFilterColumns Homogenizes schemas either by ordering the columns, removing unwanted columns or adding new columns.
tFilterRow Filters input rows by setting one or more conditions on the selected columns.
tJoin Performs inner or outer joins between the main data flow and the lookup flow.
tNormalize Normalizes the input flow following SQL standard to help improve data quality and thus eases the data update.
tPartition Allows you to visually define how an input dataset is partitioned.
tReplace Cleanses all files before further processing.
tReplicate Duplicates the incoming schema into two identical output flows.
tSample Returns a sample subset of the data being processed.
tSampleRow Selects rows according to a list of single lines and/or a list of groups of lines.
tSortRow Helps creating metrics and classification table.
tSplitRow Splits one input row into several output rows.
tSqlRow Performs SQL queries over input datasets.
tTop Sorts data and outputs several rows from the first one of this data.
tTopBy Groups and sorts data and outputs several rows from the first one of the data in each group.
tUniqRow Ensures data quality of input or output flow in a Job.
tUnite Centralizes data from various and heterogeneous sources.
tWindow Applies a given Spark window on the incoming RDDs and sends the window-based RDDs to its following component.
tWriteAvroFields Transforms the incoming data into Avro files.
tWriteDelimitedFields Converts records into byte arrays.
tWriteDynamicFields Creates a dynamic schema from input columns in the component.
tWriteJSONField Transforms the incoming data into JSON fields and transfers them to a file, a database table, etc.
tWritePositionalFields Converts records into byte arrays.
tWriteXMLFields Converts records into byte arrays.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!