Creating a Spark Job - 7.1

Talend Real-time Big Data Platform Studio User Guide

author
Talend Documentation Team
EnrichVersion
7.1
EnrichProdName
Talend Real-Time Big Data Platform
task
Design and Development
EnrichPlatform
Talend Studio

You can start either from the Job Designs node of the Repository tree view in the Integration perspective or from Big Data Batch node under the Job Designs node.

The two approaches are similar and thus the following procedure shows how to create a Spark Job from the Job Designs node.

Procedure

  1. Right-click the Job Designs node and in the contextual menu, select Create Big Data Batch Job.
    Then the New Big Data Batch Job wizard appears.
  2. From the Framework drop-down list, select Spark.
  3. In the Name, the Purpose and the Description fields, enter the descriptive information accordingly. Among the information, the Job name is mandatory.
    Once done, the Finish button is activated.
  4. If you need to change the Job version, click the M and the m buttons next to the Version field to make the changes.
    If you need to change the Job status, select it from the drop-down list of the Status field.
    If you need to edit the information in the uneditable fields, select File > Edit Project properties from the menu bar to open the Project Settings dialog box to make the desired changes.
  5. Click Finish to close the wizard and validate the changes.
    Then an empty Job is opened in the workspace of the Studio and the available components for Spark appear in the Palette.

Results

In the Repository tree view, this created Spark Job appears automatically under the Big Data Batch node under Job Designs.

Then you need to drop the components you need to use from the Palette onto the workspace and link and configure them to design a Spark Job, the same way you do for a standard Job. You also need to set up the connection to the Spark cluster to be used in the Spark configuration tab of the Run view.

You can repeat the same operations to create a Spark Streaming Job. The only different step to take is that you need to select Create Big Data Streaming Job from the contextual menu after right-clicking the Job Designs node, and then you select Spark Streaming from the Framework drop-down list in the New Big Data Streaming Job wizard that is displayed.

Note that if you need to run your Spark Job in a mode other than the Local mode, a Storage component, typically a tHDFSConfiguration component, is required in the same Job so that Spark can use this component to connect to the file system to which the jar files dependent on the Job are transferred.

You can also create these types of Jobs by writing their Job scripts in the Jobscript view and then generate the Jobs accordingly. For more information on using Job scripts, see Talend Job scripts reference guide at https://help.talend.com/.