How a Talend Job for Apache Spark works - Cloud - 8.0

Talend Studio User Guide

Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Talend Studio
Design and Development
Last publication date
Available in...

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

Using the Spark-specific components, a Talend Spark Job makes use of the Spark framework to process RDDs (Resilient Distributed Datasets) on top of a given Spark cluster.

Available in:

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

Depending on which framework you select for the Spark Job you are creating, this Talend Spark Job implements the Spark Streaming framework or the Spark framework when generating its code.

A Talend Spark Job can be run in any of the following modes:

  • Local: Talend Studio builds the Spark environment in itself at runtime to run the Job locally in Talend Studio. With this mode, each processor of the local machine is used as a Spark worker to perform the computations. This mode requires minimum parameters to be set in this configuration view.

    Note this local machine is the machine in which the Job is actually run.

  • Standalone: Talend Studio connects to a Spark-enabled cluster to run the Job from this cluster.