This scenario applies only to subscription-based Talend Platform products with Big Data and Talend Data Fabric.
For more technologies supported by Talend, see Talend components.
The following scenario creates a two-component Job that transforms data in a Spark environment using a map that was previously created in Talend Data Mapper .
tHDFSConfiguration is used in this scenario by Spark to connect to the HDFS system where the JAR files dependent on the Job are transferred.
-
Yarn mode (YARN client or YARN cluster):
-
When using Google Dataproc, specify a bucket in the Google Storage staging bucket field in the Spark configuration tab.
-
When using HDInsight, specify the blob to be used for Job deployment in the Windows Azure Storage configuration area in the Spark configuration tab.
- When using Altus, specify the S3 bucket or the Azure Data Lake store (technical preview) for Job deployment in the Spark configuration tab.
-
When using other distributions, use the configuration component corresponding to the file system your cluster is using. Typically, this system is HDFS and so use tHDFSConfiguration.
-
-
Standalone mode: use the configuration component corresponding to the file system your cluster is using, such as tHDFSConfiguration or tS3Configuration.