Exporting a preparation made on a database dataset - 2.1

Talend Data Preparation User Guide

author
Talend Documentation Team
EnrichVersion
6.4
2.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
task
Data Quality and Preparation > Cleansing data
EnrichPlatform
Talend Data Preparation
When you are finished preparing a dataset extracted from a database, you may want to export your data.

Procedure

  1. Click the Export button in the application header bar.
  2. If the result of your preparation is larger than your current sample size, 10 000 rows by default, select an export option:
    • If you select Current sample, only the sample you have been working on will be exported.
    • If you select All data, all the preparations steps you have performed on your sample will be applied to the rest of the dataset as well.
  3. Choose between exporting your data to a local file, or to a Hadoop cluster
    • If you export your data as a csv or xlsx local file, the export operation will be processed on the Talend Data Preparation server.
    • If you export your data to the Hadoop cluster, the export operation will be processed directly on the cluster. Choose the type of your output file between csv, avro or parquet. Enter the path to your prefered location on the cluster to save your file, and if you choose to authenticate via Kerberos, enter your principal and the path to your keytab file.
  4. Click Confirm.

Results

In the case of an export to a local file, if you chose to export only the Current sample, the download automatically starts. But if you selected All data to export the entire data, the export process is launched in the background. You can check the status of the export, and download your output file in the Export history page. For more information, see The export history page.

The export process triggers a refresh in the data that is fetched from the database, guaranteeing that the data displayed in the output is always up to date.

However, due to this refresh, it is possible that a dataset originally smaller than 10,000 rows, now exceeds this limit. In this case:

  • If you export to a local file, only the sample is kept.
  • If you export to a Hadoop cluster, the whole data is exported.