Transfering data from HDFS to Amazon S3 - Spark framework

Amazon S3

author
Talend Documentation Team
EnrichVersion
6.5
EnrichProdName
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
Talend Data Integration
Talend ESB
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Open Studio for Big Data
Talend Open Studio for ESB
Talend Data Services Platform
Talend Big Data
Talend MDM Platform
Talend Open Studio for Data Integration
task
Data Quality and Preparation > Third-party systems > Amazon services (Integration) > Amazon S3 components
Design and Development > Third-party systems > Amazon services (Integration) > Amazon S3 components
Data Governance > Third-party systems > Amazon services (Integration) > Amazon S3 components
EnrichPlatform
Talend Studio

The following instructions show how to read a file on HDFS, process it, and save the results on Amazon S3 using a Big Data Batch - Spark Job.

For more technologies supported by Talend, see Talend components.

Because Spark is not dependent on a specific file system, you will have to specify which file system will be used by your Spark Job.