Setting up the Spark connection - 6.3

How to Use Spark 2.0 with Studio 6.3

EnrichVersion
6.3
task
Design and Development > Designing Jobs > Hadoop distributions
Design and Development > Designing Jobs > Job Frameworks > Spark Batch
Design and Development > Designing Jobs > Job Frameworks > Spark Streaming
EnrichPlatform
Talend Studio
Define the Spark configuration in the Studio to import the Spark 2.0 related jar files.

Procedure

  1. In Studio, open the Job you want to run with Spark 2.0.
  2. To open the Run view, double-click Run.
  3. Click the Spark configuration tab.
  4. Clear the Use local mode check box.
  5. On the Distribution drop-down list, select Custom - Unsupported. This lets you import Spark jar files that are not natively supported by your Hadoop distribution.
  6. On the Spark version drop-down list, select 2.0.
  7. To open the Import Custom Definition wizard, next to the Distribution list, click the ellipsis (...).
  8. Select the Import from existing version radio button and choose your distribution. Ensure that the Spark check box is selected.
  9. Click OK, and in the pop-up dialog box, click Yes. The [Custom Hadoop Version Definition] wizard opens.
  10. On the jar list, remove all entrances except the one for talend-mapred-lib.jar. If you run your Job on Windows, keep winutils-hadoop-2.6.0.exe, too.
  11. To open the [Select Libraries] wizard, click the plus symbol (+), then select External libraries.
  12. To access and select the Spark jar file you downloaded from your cluster earlier, click Browse....

    You see files like these:

  13. After adding all the jar files, to validate the changes, click OK.