Which big data formats are supported

author
Ciaran Dynes
EnrichVersion
6.4
6.3
6.2
6.1
EnrichProdName
Talend Open Studio for ESB
Talend Data Fabric
Talend ESB
Talend Big Data Platform
Talend Open Studio for MDM
Talend Big Data
Talend Open Studio for Data Quality
Talend Open Studio for Data Integration
Talend Real-Time Big Data Platform
Talend Data Integration
Talend MDM Platform
Talend Open Studio for Big Data
Talend Data Services Platform
Talend Data Management Platform
task
Design and Development > Designing Jobs > Hadoop distributions
EnrichPlatform
Talend Studio

Which big data formats are supported

Depending on the component and target language that will be generated, different file types (formats) are available in Talend Big Data Studio. For example, tHDFSInput supports both the Text and Sequence file types, but not ORC. In some cases (for example ORC), the file format is only available with Hive and was specifically developed to improve performance.
Description
  Classic Data Integration Job Map / Reduce Job Spark Batch Job Spark Streaming Job
  HDFS Pig Hive      
Text File Yes Yes Yes Option in HDFS components Yes Yes
Sequence File Yes Yes Yes Option in HDFS components Yes Yes
RC No Yes Yes No No No
ORC (since HDP 2.0 only) No No Yes No Yes Yes
Avro No Yes Yes Specific Avro components Specific Avro components Specific Avro components
Parquet No Yes Yes Specific Parquet components Specific Parquet components Specific Parquet components
JSON Get/Put only Custom Loader No Specific JSON components Specific JSON components Specific JSON components
XML No Custom Loader No Specific XML components Specific XML components Specific XML component
Impala Complex Types Yes Yes Yes Yes Yes Yes

Each file format was developed with specific features/benefits in mind by the Hadoop community.

We recommend that you check out the Component Reference Guide corresponding to your Talend Big Data product version to determine if a given Talend component supports a file format, as a newer version of the product may be required for some formats.

Environment

All supported Hadoop distributions.