Creating the Spark Batch Job - 8.0

Talend Real-Time Big Data Platform Getting Started Guide

Version
8.0
Language
English
Operating system
Real-Time Big Data Platform
Product
Talend Real-Time Big Data Platform
Module
Talend Administration Center
Talend Installer
Talend Runtime
Talend Studio
Content
Data Quality and Preparation > Cleansing data
Data Quality and Preparation > Profiling data
Design and Development
Installation and Upgrade
Last publication date
2024-03-13
A Talend Job for Apache Spark Batch allows you to access and use the Talend Spark components to visually design Apache Spark programs to read, transform or write data.

Before you begin

  • You have launched Talend Studio and opened the Integration perspective.

Procedure

  1. In the Repository tree view, expand the Job Designs node, right-click the Big Data Batch node and select Create folder from the contextual menu.
  2. In the New Folder wizard, name your Job folder getting_started and click Finish to create your folder.
  3. Right-click the getting_started folder and select Create folder again.
  4. In the New Folder wizard, name the new folder to spark and click Finish to create the folder.
  5. Right-click the spark folder and select Create Big Data Batch Job.
  6. In the New Big Data Batch Job wizard, select Spark from the Framework drop-down list.
  7. Enter a name for this Spark Batch Job and other useful information.

    For example, enter aggregate_movie_director_spark in the Name field.

Results

The Spark Batch component Palette is now available in Talend Studio. You can start to design the Job by leveraging this Palette and the Metadata node in the Repository.