Dropping and linking Spark components

Dropping and linking Spark components - 7.3

Talend Data Fabric Getting Started Guide

Version

7.3

Language

English

Operating system

Data Fabric

Product

Talend Data Fabric

Module

Talend Administration Center

Talend DQ Portal

Talend Installer

Talend Runtime

Talend Studio

Content

Data Quality and Preparation > Cleansing data

Data Quality and Preparation > Profiling data

Design and Development

Installation and Upgrade

Last publication date

2023-07-24

You orchestrate the Spark Batch components in the Job workspace in order to design a data transformation process that runs in the Apache Spark Batch framework.

Before you begin

You have launched your Talend Studio and opened the Integration perspective.
An empty Job has been created as described in Creating the Spark Batch Job and is open in the workspace.

Procedure

In the Job, enter the name of the component to be used and select this component from the list that appears. In this scenario, the components are two tFileInputDelimited components, a tMap component, two tFileOutputParquet components and a tAzureFSConfiguration component.
- The tFileInputDelimited components are used to load the movie data and the director data, respectively, from the DBFS file system of your Databricks Big Data platform into the data flow of the current Job.
- The tMap component is used to transform the input data.
- The tFileOutputParquet components write the results in a directory in your Azure Data Lake Storage system.
- The tAzureFSConfiguration component provides the necessary information to connect to your Azure Data Lake Storage system.
Double-click one of the two tFileInputDelimited component to make this label editable and then enter movie to change the label of this component.
Do the same to label the other tFileInputDelimited to director.
Right click the tFileInputDelimited component that is labelled movie, then from the contextual menu, select Row > Main and click tMap to connect it to tMap. This is the main link through which the movie data is sent to tMap.
Do the same to connect the director tFileInputDelimited component to tMap using the Row > Main link. This is the Lookup link through which the director data is sent to tMap as lookup data.
Do the same to connect the tMap component to one of the tFileOutputParquet using the Row > Main link, then in the pop-up wizard, name this link to out1 and click OK to validate this change.
Repeat these operations to connect the tMap component to the other tFileOutputParquet component using the Row > Main link and name it to reject.

Results

In the workspace, the whole Job looks like this: