Testing Spark Jobs using test cases - Cloud

Testing Spark Jobs using test cases - Cloud - 8.0

Talend Studio User Guide

Version

Cloud

8.0

Language

English

Product

Talend Big Data

Talend Big Data Platform

Talend Cloud

Talend Data Fabric

Talend Data Integration

Talend Data Management Platform

Talend Data Services Platform

Talend ESB

Talend MDM Platform

Talend Real-Time Big Data Platform

Module

Talend Studio

Content

Design and Development

Last publication date

2024-04-16

Available in...

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

The test framework described in Testing Jobs and Services using test cases is also applicable on a Spark Job during Continuous Integration development to make sure this Spark Job will function as expected when it is actually executed to handle large datasets.

You need to follow the same steps detailed in Testing Jobs and Services using test cases to accomplish a Spark test case but be aware that a different Test Skeleton is dedicated to Spark Jobs.

Spark Job representing a Spark Test Skeleton.

By default, a Spark Test Skeleton includes:

one or more tFixedFlowInput components (or tBoundedStreamInput for a Spark Streaming Job), depending on the number of input flows in the Job, to load the input file(s),

ⓘ

Available in:

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

one or more tBoundedStreamInput for a Spark Streaming Job, depending on the number of input flows in the Job, to load the input file(s),
the read-only INPUT and OUTPUT icons that are used to indicate the beginning and the end of the part to be tested,
one or more tCollectAndCheck components, depending on the number of output flows in the Job, to compare the temporary output file(s) with the reference file(s). The test is considered successful if the compared pair of files are identical and a failure otherwise.

In addition, the Local mode is used by default in the Spark configuration tab. Depending on the number of input and output flows, a number of context variables are automatically created to specify the input and reference files and a Use context variable radio button is available in the Basic settings tab of tFixedFlowInput or tBoundedStreamInput and is automatically selected to allow you to choose one of these new context variables to use.

Note that before creating a test case for a Job, make sure all the components of your Job have been configured.

For further information about Continuous Integration and how you can implement it with Talend, see the Software Development Life Cycle best practices guide.