Streaming Execution - 6.3

Talend Data Mapper User Guide

Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
Design and Development
Talend Studio

Streaming execution is used to process unlimited amounts of data. Without streaming execution, the entire input of the transformation is stored into memory before the transformation is executed, which limits the amount of data to be transformed to what may fit in the available memory.

Streaming execution works by accumulating chunks of input data and then executing the transformation on each chunk separately. Because of this, there are limitations on what may be specified in the transformation.

You specify that the transformation is to stream by checking the Stream Input property on the SimpleLoop function. When that is done the loop is automatically partitioned into chunks depending on how much memory is allocated per chunk and each chunk is processed separately. Because of this, the use of an aggregate function to span all occurrences of the loop is not permitted. It is possible to perform aggregation using the GetVariable and SetVariable functions, as these maintain their state across multiple transformation executions.

If you select Stream Input property on the SimpleLoop function, you cannot use sort keys, since the sort action cannot be performed while streaming.

If you select Stream Input property on the SimpleLoop function, and you also select a distinct child element, the input is already sorted by the child element such that the distinct calculation can be done without further sorting.