Handling compressed data - 7.1

Talend Real-time Big Data Platform Studio User Guide

Talend Documentation Team
Talend Real-Time Big Data Platform
Design and Development
Talend Studio

Hadoop supports many different file compression formats that bring benefits such as reducing the space required to store files and speeding up data transfer.

In a Job, you can directly deal with compressed files using the file system related components such as tHDFSInput or tFileInputDelimited.

In this section, MapReduce components are used to show how to read or write compressed files.


  • After opening a MapReduce Job in the workspace, use tHDFSInput to read compressed files stored in a given HDFS system. In the Component view of tHDFSInput, enter the name and the extension of the compressed file to be read.


    Note that in the Standard version of tHDFSInput, you need to use the Uncompress the data check box to select the format you want to decompress data from.

  • If you need to write compressed files to the HDFS system, place the tHDFSOutput component in the workspace and select the Compress the data check box to define the format you want to compress data to.