Now that your are done preparing your data, you will export it back to the
cluster, but as a Parquet file this time.
Note that the cluster where you will export your cleansed data, must be the same cluster
from which you imported the data in the first place.
Procedure
-
Click the Export button in the application header
bar.
-
Select the All data radio button so that the whole data
is prepared, and not just the sample you worked on.
-
Select the HDFS file radio button to export your data to
the Hadoop cluster.
Note that the cluster where you will export your cleansed data, must be the
same cluster from which you imported the data in the first place.
-
Select the Parquet format.
-
In the Output path field, enter the complete URL to your
preferred location on the cluster to save the exported file.
You can manually configure Talend Data Preparation to display a default
value in the Output path
field.
-
Select Specified Kerberos as authentication
method.
-
Specify your principal and the path to your keytab
file.
If you choose Default Kerberos, the values for the
keytab file path and the principal will be the ones
entered in Talend Data Preparation
configuration
file.
In any case, the path must point to a keytab file that
is accessible to all the workers on the cluster.
Select the Simple authentication if you are not using
Kerberos.
-
Click Confirm
You export starts in the background, and is now being processed directly on
the cluster.
Note that if a preparation contains actions that only affect a single row, or
cells, they will be skipped during the export process. A warning will be
displayed before the export if your preparation contains such actions.
-
Click the Export history button in the application
header bar to check the status of the export.
Among other information, you can see that the export was successful.
Results
Your data has been processed and saved as a parquet file,
without leaving the cluster.