Testing Spark Jobs using test cases - Cloud - 8.0

Talend Studio User Guide

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development
Last publication date
2024-02-29
Available in...

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

The test framework described in Testing Jobs and Services using test cases is also applicable on a Spark Job during Continuous Integration development to make sure this Spark Job will function as expected when it is actually executed to handle large datasets.

You need to follow the same steps detailed in Testing Jobs and Services using test cases to accomplish a Spark test case but be aware that a different Test Skeleton is dedicated to Spark Jobs.

Spark Job representing a Spark Test Skeleton.

By default, a Spark Test Skeleton includes:

  • one or more tFixedFlowInput components (or tBoundedStreamInput for a Spark Streaming Job), depending on the number of input flows in the Job, to load the input file(s),

  • Available in:

    Cloud Data Fabric

    Data Fabric

    Real-Time Big Data Platform

  • one or more tBoundedStreamInput for a Spark Streaming Job, depending on the number of input flows in the Job, to load the input file(s),

  • the read-only INPUT and OUTPUT icons that are used to indicate the beginning and the end of the part to be tested,

  • one or more tCollectAndCheck components, depending on the number of output flows in the Job, to compare the temporary output file(s) with the reference file(s). The test is considered successful if the compared pair of files are identical and a failure otherwise.

In addition, the Local mode is used by default in the Spark configuration tab. Depending on the number of input and output flows, a number of context variables are automatically created to specify the input and reference files and a Use context variable radio button is available in the Basic settings tab of tFixedFlowInput or tBoundedStreamInput and is automatically selected to allow you to choose one of these new context variables to use.

Note that before creating a test case for a Job, make sure all the components of your Job have been configured.

For further information about Continuous Integration and how you can implement it with Talend, see the Software Development Life Cycle best practices guide.