What is the difference between a Joblet and the tRunJob component
Both Joblet and tRunJob component encourage code reuse and refactoring, help improve the development efficiency and ease the maintenance.
However, you may wonder what is the difference between them, and in which case you should use one or the other. This article explains the differences between a Joblet and the tRunJob component from a technical point of view as well as from a usage angle.
Although the tRunJob is a generic component available in the core product Talend Open Studio for Data Integration, the Joblets are an advanced feature that is only available in Talend Enterprise subscription products.
This article applies mostly to Talend Enterprise subscription product users.
Talend Studio uses a Java code generator, each Job is translated to a Java class. From a technical point of view, there are two differences:
- The tRunJob component executes a child Job, which is a separate Java class. The main Job instantiates the child Job and executes it using the tRunJob method. A Joblet is just a GUI extraction and refactoring of some components. It creates a reusable transformation, the generated code of Joblet is still a part of the Java class of the main Job.
- The tRunJob component is a different unit of execution and has its own context variables. The child Job, called with the tRunJob in the main Job, can't access the context variables of the main Job. However, a Joblet can access the context variables of the main Job, as it is a part of the main Job.
Because of the differences between a Joblet and the tRunJob component in the code refactoring and function, the decision of when to use a Joblet or the tRunJob component is based on business requirements. The following explanation describes the circumstances which could lead you to choose one or the other.
When to use a Joblet
The Joblet code is automatically included in the main Job code at runtime, thus using less resources and improving performance. A Joblet is usually used to achieve the following needs:
- Output or print static messages. Sometimes, you want to trace the Job execution, print a
static message for each step, for example, create a Joblet and use a
tJava to print this message at the beginning of the Job
System.out.println("The job starts to run")
- Load value of context variables from a file or a database. If a Job or multiple Jobs load the value of context variables from a file or a database, you should usually create a dedicated Joblet to accomplish this task.
- Manage custom logs with a tLogCatcher component or a tStatCatcher component as the first component in the Jobs.
Create a reusable transformation regardless of the type of input and output data source.
For example, you are reading data both from a file and a database in a Job, you need to process data in the same action.
For more information about Joblet, see the Talend User Guides on Talend Help Center.
When to use the tRunJob component
The tRunJob component helps mastering complex Job systems in real project. The tRunJob is usually used to achieve the following needs:
- This component can be used as a standalone Job and helps clarifying a complex Job by
avoiding having too many sub-jobs in one Job. You can create different Jobs for processing
different business requirements, and then create a main Job to run the child Jobs called
with the tRunJob component.
For example, assuming you are building a data warehouse for retail, you populate the fact tables such as users, product, orders and dimension tables in different Jobs, and create a main Job to run the child Jobs one by one.
- The tRunJob component is the only solution to read data from a data source, then process the data in a component. However, there might exist problematic data that lead to the Job execution failure. The Job throws a Java exception and stops to run. You need to capture the Java exception with a tLogCatcher component, log it to your database or file, and make the Job continue to perform the next data.
Example: Reading email addresses from a table and sending an email to each person
In this example, a table stores the email information.
The request is to read the email addresses from the table and send an email to each person with a tSendMail. But, as this table may contain invalid emails, the Job stops once an invalid email is sent to the tSendMail if you put all the components in one Job. To achieve this request, design the Jobs as follows.
- The tMysqlInput_1 component reads emails from the table.
- The tFlowToIterate_1 component iterates each email.
- The tRunJob_1 component calls the child Job.
- The tSendMail_1 component sends an email to each person.
- The tLogCatcher_1 component catches the Java exception and log it into a table via the tMysqlOutput_1.
Configure the Parent Job
Before you begin
- You have configured the parent Job as previously.
- You have defined the context variable called email in the
- Double-click the tRunJob_1 to open its Basic settings view.
- In the Job field, select the Job to be called in and processed, in this example childJob1.
- Clear the Die on child error check box so that the parent Job will not stop even though an error occurs in the child Job.
In the Context Param table, pass the current email from
the main Job to the child Job.
Note: For more information, see Passing a value from a parent Job to a child Job.
- Click the [+] button to add the parameter defined in the Context tab of the child Job, in this example email.
- Define its value, (String)globalMap.get("row1.email").
- Press Ctrl+S to save your Job.
Configure the Child Job
Before you begin
- You have configured the child Job as previously.
- In the Basic settings tab of the tSendMail_1, in the To field, enter the context variable that stores the current email passed from the parent Job, in this example context.email.
Select the Die on error check box.
Note: This option makes the child Job throw a Java exception that will be captured by the tLogCatcher component when an email address is invalid.
- Press Ctrl+S to save your Job.