In Talend Data Preparation, you can import different types of file to use as source data for your datasets.
There are two main different types of datasets that you can import:
- Datasets imported from local files
- Datasets created from Talend Jobs
From local files
You can import the following file types to use as datasets:
- .xls or .xlsx
For more information, see Adding a dataset from a local file.
From a Talend Job
In addition to the previous file types, you have the possibility to use datasets created directly from a Talend Job in Talend Studio if you are a subscription user.You can do that by using the tDatasetOutput component as output for your Job in Talend Studio.
Then you can either:
- Run the Job directly in Talend Studio. For more information, see Creating a dataset from a Talend Job.
- Use the live dataset feature to run it via Talend Administration Center and access the data directly in Talend Data Preparation. For more information, see Creating a dataset based on an on-demand Job execution.
From a database
Talend Data Preparation is able to connect to various databases and use them as source to create a new dataset. The data is still stored in your database, and only a sample is retrieved on-demand.
For more information, see Adding a dataset from a database.
You can access data that is stored on a Hadoop file system (HDFS), and import it in the form of a dataset, directly in the Talend Data Preparation interface. You can then export the prepared data back to the cluster, or export it as a local file.
For more information, see Adding a dataset from HDFS.