How to handle compressed data - 6.3

Talend Data Fabric Studio User Guide

Talend Data Fabric
Talend Studio
Data Quality and Preparation
Design and Development


The information in this section is for subscription-based Talend Studio users only and is not applicable to Talend Open Studio for Big Data users.

In this section, MapReduce components are used to show how to read or write compressed files.

After opening a MapReduce Job in the workspace, you need to use tHDFSInput to read compressed files stored in a given HDFS system. In the Component view of tHDFSInput, you need simply enter the name and the extension of the compressed file to be read.

Note that in the Standard version of tHDFSInput, you need to use the Uncompress the data check box to select the format you want to decompress data from.

If you need to write compressed files to the HDFS system, you can place the tHDFSOutput component in the workspace and select the Compress the data check box to define the format you want to compress data to.

For further information about the components mentioned-above and the compression formats supported by the MapReduce version of tHDFSInput, see Talend Components Reference Guide.