Splitting Strategy - 6.3

Talend ESB Mediation Developer Guide

EnrichVersion
6.3
EnrichProdName
Talend Data Fabric
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for ESB
Talend Real-Time Big Data Platform
task
Design and Development
EnrichPlatform
Talend ESB

In the current version of Hadoop opening a file in append mode is disabled since it's not very reliable. So, for the moment, it's only possible to create new files. The Camel HDFS endpoint tries to solve this problem in this way:

  • If the split strategy option has been defined, the hdfs path will be used as a directory and files will be created using the configured UuidGenerator

  • Every time a splitting condition is met, a new file is created.

    The splitStrategy option is defined as a string with the following syntax:

    splitStrategy=<ST>:<value>,<ST>:<value>,*

where <ST> can be:

  • BYTES a new file is created, and the old is closed when the number of written bytes is more than <value>

  • MESSAGES a new file is created, and the old is closed when the number of written messages is more than <value>

  • IDLE a new file is created, and the old is closed when no writing happened in the last <value> milliseconds

Note that this strategy currently requires either setting an IDLE value or setting the HdfsConstants.HDFS_CLOSE header to false to use the BYTES/MESSAGES configuration...otherwise, the file will be closed with each message

for example:

hdfs2://localhost/tmp/simple-file?splitStrategy=IDLE:1000,BYTES:5

it means: a new file is created either when it has been idle for more than 1 second or if more than 5 bytes have been written. So, running hadoop fs -ls /tmp/simple-file you'll see that multiple files have been created.