File formats supported by Talend Data Preparation - 2.0

Talend Data Preparation User Guide

Talend Documentation Team
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Data Quality and Preparation > Cleansing data
Talend Data Preparation

In Talend Data Preparation, you can import different types of file to use as source data for your datasets.

There are two main different types of datasets that you can import:

  • Datasets imported from local files
  • Datasets created from Talend Jobs

From local files

You can import the following file types to use as datasets:

  • .xls or .xlsx
  • .csv

For more information, see Adding a dataset from a local file.

From a Talend Job

In addition to the previous file types, you have the possibility to use datasets created directly from a Talend Job in Talend Studio if you are a subscription user.

You can do that by using the tDatasetOutput component as output for your Job in Talend Studio.

Then you can either:

From a database

Talend Data Preparation is able to connect to various databases and use them as source to create a new dataset. The data is still stored in your database, and only a sample is retrieved on-demand.

For more information, see Adding a dataset from a database.


You can access data that is stored on a Hadoop file system (HDFS), and import it in the form of a dataset, directly in the Talend Data Preparation interface. You can then export the prepared data back to the cluster, or export it as a local file.

For more information, see Adding a dataset from HDFS.