Running a Job on Spark or YARN
In this tutorial, create a Big Data Batch Job running on Spark or YARN and read data from HDFS.
Creating a Talend Studio project
Creating a project is the first step to using Talend Studio. Projects allow you to better organize your work.
Procedure
Results
Creating a Big Data Batch Job to use Spark or YARN
For Big Data processing, Talend Studio allows you to create Batch Jobs and Streaming Jobs running on Spark or MapReduce.
Before you begin
Procedure
Results
Running a Job on Spark
In this tutorial, learn how to run a Talend Studio Job on Spark.
Configuring a HDFS connection to run on Spark
Using the tHDFSConfiguration component, you can connect your HDFS filesystem to Spark.
Before you begin
- This tutorial makes use of a Hadoop cluster. You must have a Hadoop cluster available to you.
- You must also have HDFS metadata configured (see Creating a Hadoop cluster metadata definition and Importing a Hadoop cluster metadata definition).
Procedure
-
In the Repository, expand
, then expand the Hadoop cluster
metadata of your choice.
- Click OK.
Results
What to do next
Reading data from a HDFS connection on Spark
Using predefined HDFS metadata, you can read data from a HDFS filesystem on Spark.
Before you begin
- This tutorial makes use of a Hadoop cluster. You must have a Hadoop cluster available to you.
- You must also have HDFS metadata configured (see Creating a Hadoop cluster metadata definition and Importing a Hadoop cluster metadata definition).
- You must have configured your HDFS connection on Spark (see Configuring a HDFS connection to run on Spark).
Procedure
Results
Running a Job on YARN
In this tutorial, learn how to run a Talend Studio Job on YARN.
Configuring a HDFS connection to run on YARN
Using the tHDFSConfiguration component, you can connect your HDFS filesystem to YARN.
Before you begin
- This tutorial makes use of a Hadoop cluster. You must have a Hadoop cluster available to you.
- You must also have HDFS metadata configured (see Creating a Hadoop cluster metadata definition and Importing a Hadoop cluster metadata definition).
Procedure
Results
What to do next
Reading data from a HDFS connection on YARN
Using predefined HDFS metadata, you can read data from a HDFS filesystem on YARN.
Before you begin
- This tutorial makes use of a Hadoop cluster. You must have a Hadoop cluster available to you.
- You must also have HDFS metadata configured (see Creating a Hadoop cluster metadata definition and Importing a Hadoop cluster metadata definition).