Supported input and output formats - 8.0

Talend Data Preparation User Guide

Version
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Data Preparation
Content
Data Quality and Preparation > Cleansing data
Last publication date
2024-03-26
In Talend Data Preparation, different formats are supported to use as source for your datasets, and as output when exporting the result of your preparations.

From a local file

You can import the following file types to use as datasets:

  • .xls or .xlsx
  • Files with separator
Note: Positional files are not supported.

Preparations based on local files can be exported to the following formats:

  • Local file with separator
  • Local .xlsx
  • Local Tableau
  • Amazon S3

For more information, see Adding a dataset from a local file.

From a Talend Job

In addition to the previous file types, you have the possibility to use datasets created directly from a Talend Job in Talend Studio.

You can do that by using the tDatasetOutput component as output for your Job in Talend Studio. For more information, see the documentation for the Data Preparation components.

Preparations based on Talend Jobs can be exported to the following formats:

  • Local file with separator
  • Local .xlsx
  • Local Tableau
  • Amazon S3

From a database

Talend Data Preparation is able to connect to various databases and use them as source to create a new dataset. The data is still stored in your database, and only a sample is retrieved on-demand.

Preparations based on database datasets can be exported to the following formats:

  • Local file with separator
  • Local .xlsx
  • Local Tableau
  • Amazon S3

For more information, see Adding a dataset from a database.

From HDFS

You can access data that is stored on a Hadoop file system (HDFS), and import it in the form of a dataset, directly in the Talend Data Preparation interface.

You can import the following file types stored on HDFS:

  • File with separator
  • .xlsx
  • Avro
  • Parquet

Preparations based on HDFS files can be exported to the following formats:

  • Local file with separator
  • Local .xlsx
  • Local Tableau
  • Amazon S3

For more information, see Adding a dataset from HDFS.

From Salesforce

You can access data that is stored on Salesforce, and import it in the form of a dataset, directly in the Talend Data Preparation interface.

Preparations based on Salesforce datasets can be exported to the following formats:

  • Local file with separator
  • Local .xlsx
  • Local Tableau
  • Amazon S3

For more information, see Adding a dataset from Salesforce.

From Amazon S3

You can access data that is stored on Amazon S3, and import it in the form of a dataset, directly in the Talend Data Preparation interface.

You can import the following file types stored on Amazon S3:

  • File with separator
  • .xlsx
  • Avro
  • Parquet

Preparations based on Amazon S3 files can be exported to the following formats:

  • Local file with separator
  • Local .xlsx
  • Local Tableau
  • Amazon S3

For more information, see Adding a dataset from Amazon S3.

From Azure Data Lake Storage Gen2

You can access data that is stored on ADLS Gen2, and import it in the form of a dataset, directly in the Talend Data Preparation interface.

You can import the following file types stored on ADLS Gen2:

  • Local file with separator
  • Avro
  • Parquet
  • JSON

Preparations based on ADLS Gen2 datasets can be exported to the following formats:

  • Local file with separator
  • Local .xlsx
  • Local Tableau
  • Amazon S3

For more information, see Adding a dataset from Azure DLS Gen2.