Aggregator - 6.3

Talend ESB Mediation Developer Guide

Talend Data Fabric
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for ESB
Talend Real-Time Big Data Platform
Design and Development
Talend ESB

The Aggregator from the EIP patterns allows you to combine a number of messages together into a single message.

A correlation Expression is used to determine the messages which should be aggregated together. If you want to aggregate all messages into a single message, just use a constant expression. An AggregationStrategy is used to combine all the message exchanges for a single correlation key into a single message exchange.

The aggregator provides a pluggable repository which you can implement your own org.apache.camel.spi.AggregationRepository. If you need a persistent repository then you can use either Camel HawtDB, LevelDB, or SQL Component.

You can manually trigger completion of all current aggregated exchanges by sending a message containing the header Exchange.AGGREGATION_COMPLETE_ALL_GROUPS set to true. The message is considered a signal message only, the message headers/contents will not be processed otherwise. Alternatively, starting with Camel 2.11, the header Exchange.AGGREGATION_COMPLETE_ALL_GROUPS_INCLUSIVE can be set to >true to trigger completion of all groups after processing the current message.

The Apache Camel website maintains several examples of this EIP in use.

Aggregator options

The aggregator supports the following options:






Mandatory Expression which evaluates the correlation key to use for aggregation. The Exchange which has the same correlation key is aggregated together. If the correlation key could not be evaluated an Exception is thrown. You can disable this by using the ignoreBadCorrelationKeys option.



Mandatory AggregationStrategy which is used to merge the incoming Exchange with the existing already merged exchanges. At first call the oldExchange parameter is null. On subsequent invocations the oldExchange contains the merged exchanges and newExchange is of course the new incoming Exchange. The strategy can also be a TimeoutAwareAggregationStrategy implementation, supporting the timeout callback. Here, Camel will invoke the timeout method when the timeout occurs. Notice that the values for index and total parameters will be -1, and the timeout parameter will only be provided if configured as a fixed value.



A reference to lookup the AggregationStrategy in the Registry. From Camel 2.12 onwards you can also use a POJO as the AggregationStrategy, see further below for details.



Camel 2.12: This option can be used to explicit declare the method name to use, when using POJOs as the AggregationStrategy. See further below for more details.



Camel 2.12: If this option is false then the aggregate method is not used for the very first aggregation. If this option is true then null values is used as the oldExchange (at the very first aggregation), when using POJOs as the AggregationStrategy. See further below for more details.



number of messages aggregated before the aggregation is complete. This option can be set as either a fixed value or using an Expression which allows you to evaluate a size dynamically; it will use Integer as result. If both are set, Camel will fallback to use the fixed value if the Expression result was null or 0.



Time in milliseconds that an aggregated exchange should be inactive before it is complete. This option can be set as either a fixed value or using an Expression which allows you to evaluate a timeout dynamically; it will use Long as result. If both are set Camel will fallback to use the fixed value if the Expression result was null or 0. You cannot use this option together with completionInterval, only one of the two can be used.



A repeating period in milliseconds by which the aggregator will complete all current aggregated exchanges. Camel has a background task which is triggered every period. You cannot use this option together with completionTimeout, only one of them can be used.



A Predicate to indicate when an aggregated exchange is complete.



This option is if the exchanges are coming from a Batch Consumer. Then when enabled the Aggregator will use the batch size determined by the Batch Consumer in the message header CamelBatchSize. See more details at Batch Consumer. This can be used to aggregate all files consumed from a File endpoint in that given poll.



Indicates completing all current aggregated exchanges when the context is stopped.



Whether or not to eager check for completion when a new incoming Exchange has been received. This option influences the behavior of the completionPredicate option as the Exchange being passed in changes accordingly. When false the Exchange passed in the Predicate is the aggregated Exchange which means any information you may store on the aggregated Exchange from the AggregationStrategy is available for the Predicate. When true the Exchange passed in the Predicate is the incoming Exchange, which means you can access data from the incoming Exchange.



If enabled then Camel will group all aggregated Exchanges into a single combined org.apache.camel.impl.GroupedExchange holder class that holds all the aggregated Exchanges. And as a result only one Exchange is being sent out from the aggregator. Can be used to combine many incoming Exchanges into a single output Exchange without coding a custom AggregationStrategy yourself. Note this option does not support persistent aggregator repositories. See further below for an example and more details.



Whether or not to ignore correlation keys which could not be evaluated to a value. By default Camel will throw an Exception, but you can enable this option and ignore the situation instead.



Whether or not too late Exchanges should be accepted or not. You can enable this to indicate that if a correlation key has already been completed, then any new exchanges with the same correlation key be denied. Camel will then throw a closedCorrelationKeyException exception. When using this option you pass in a integer which is a number for a LRUCache which keeps that last X number of closed correlation keys. You can pass in 0 or a negative value to indicate a unbounded cache. By passing in a number you are ensured that cache won't grow too big if you use a log of different correlation keys.



Whether or not exchanges which complete due to a timeout should be discarded. If enabled then when a timeout occurs the aggregated message will not be sent out but dropped (discarded).



Allows you to plugin you own implementation of Camel's AggregationRepository class which keeps track of the current inflight aggregated exchanges. Camel uses by default a memory based implementation.



Reference to lookup a aggregationRepository in the Registry.



When aggregated are completed they are being send out of the aggregator. This option indicates whether or not Camel should use a thread pool with multiple threads for concurrency. If no custom thread pool has been specified then Camel creates a default pool with 10 concurrent threads.



If using parallelProcessing you can specify a custom thread pool to be used. In fact also if you are not using parallelProcessing this custom thread pool is used to send out aggregated exchanges as well.



Reference to lookup a executorService in the Registry


If using either of the completionTimeout, completionTimeoutExpression, or completionInterval options a background thread is created to check for the completion for every aggregator. Set this option to provide a custom thread pool to be used rather than creating a new thread for every aggregator.


Reference to lookup a timeoutCheckerExecutorService in the Registry.



Starting with Camel 2.11, turns on using optimistic locking, which requires that the aggregationRepository setting be used.



Starting with Camel 2.11.1,allows to configure retry settings when using optimistic locking.

Exchange Properties

The following properties are set on each aggregated Exchange:






The total number of Exchanges aggregated into this combined Exchange.



Indicator how the aggregation was completed as a value of either: predicate, size, consumer, timeout, {{forceCompletion}} or interval.

About AggregationStrategy

The AggregationStrategy is used for aggregating the old (lookup by its correlation id) and the new exchanges together into a single exchange. Possible implementations include performing some kind of combining or delta processing, such as adding line items together into an invoice or just using the newest exchange and removing old exchanges such as for state tracking or market data prices; where old values are of little use.

Notice the aggregation strategy is a mandatory option and must be provided to the aggregator.

If your aggregation strategy implements TimeoutAwareAggregationStrategy, then Camel will invoke the timeout method when the timeout occurs. Notice that the values for index and total parameters will be -1, and the timeout parameter will be provided only if configured as a fixed value. You must not throw any exceptions from the timeout method.

If your aggregation strategy implements CompletionAwareAggregationStrategy, then Camel will invoke the onComplete method when the aggregated Exchange is completed. This allows you to do any last minute custom logic such as to cleanup some resources, or additional work on the exchange as its now completed. You must not throw any exceptions from the onCompletion method.

About completion

When aggregation Exchanges at some point you need to indicate that the aggregated exchanges is complete, so they can be send out of the aggregator. Camel allows you to indicate completion in various ways as follows:

  • completionTimeout - Is an inactivity timeout in which is triggered if no new exchanges have been aggregated for that particular correlation key within the period.

  • completionInterval - Once every X period all the current aggregated exchanges are completed.

  • completionSize - Is a number indicating that after X aggregated exchanges it's complete.

  • completionPredicate - Runs a Predicate when a new exchange is aggregated to determine if we are complete or not

  • completionFromBatchConsumer - Special option for Batch Consumer which allows you to complete when all the messages from the batch has been aggregated. |

  • forceCompletionOnStop - Indicates to complete all current aggregated exchanges when the context is stopped.

Notice that all the completion ways are per correlation key. And you can combine them in any way you like. It's basically the first which triggers that wins. So you can use a completion size together with a completion timeout. Only completionTimeout and completionInterval cannot be used at the same time.

Notice the completion is a mandatory option and must be provided to the aggregator. If not provided Camel will throw an Exception on startup.