Designing a Storm Job - 6.2

Talend Real-time Big Data Platform Studio User Guide

EnrichVersion
6.2
EnrichProdName
Talend Real-Time Big Data Platform
task
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

A Storm Job is designed the same way as any other Talend Job but using the dedicated Storm interface and Storm components. Likewise, you need to follow a simple template to use the components.

The rest of this section presents more details of this template by explaining a series of actions to be taken to create a Storm Job, or in other words, a Storm topology.

  1. Once the Studio is launched, in the Repository of the Integration perspective, right-click the Storm Jobs node under Job designs, or right-click the Job designs node itself if the Storm Jobs node does not exist yet. and then from the contextual menu, select Create Storm Job to create an empty Job in the workspace.

  2. In the workspace, type in tKafkaInput to display it in the contextual component list and select it.

    The tKafkaInput component is the message input component of a Storm Job. It allows the Job to call the Zookeeper service used by your Kafka cluster.

    Note that tRowGenerator and tFixedFlowInput are also available as input components that you can use to test the Job being created.

  3. Add other components available in the Storm components Palette to process the messages depending on the operations you want the Job to perform. Then connect them using the Row > Main link.

  4. At the end of the Job you are designing, add the tJDBCOutput component to write the processed data into a given system.

    You can use tLogRow in place of tJDBCOutput to output the data into the console of the Run view of this Job.

The following image presents a Storm Job that is not ready for production but can be already run in a Storm cluster to test the transformation actions.

Before being able to eventually run a Storm Job, you still need to configure its connection to the Storm cluster to be used and define the actions relating to submitting, killing and monitoring this Job (topology), in the Storm configuration tab of the Run view of this Job.

For further explanations of each parameter in this view, see the scenario described in the tKafkaInput section of Talend Components Reference Guide.

Note that the connection created from this Storm configuration view is effective on a per-Job basis, therefore, when you need to run another Job, you have to configure the connection specific for that Job.

For more details about the components mentioned in this section, see the related sections in Talend Components Reference Guide.

For a detailed scenario running a Storm Job, see the tKafkaInput section of Talend Components Reference Guide.

You can remotely manage and execute your Storm Jobs from Talend Administration Center. For further information, see Talend Administration Center User Guide.

If you need to deploy and execute a Storm Job on a server independently of the Studio, you can use the Build Job feature to export this Job. For further information about this Build Job feature, see How to build Jobs.

You can also create these types of Jobs by writing their Job scripts in the Jobscript view and then generate the Jobs accordingly.