Debugging Talend Jobs - 7.3

EnrichVersion
7.3
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for ESB
Talend Real-Time Big Data Platform
EnrichPlatform
Talend Studio
task
Design and Development > Designing Jobs

Debugging Talend Jobs: overview

One of the key steps in software development involves testing your code for errors and determining the cause of the errors. Talend offers different built-in debugging capabilities as well as Job design and logging strategies to help determine the cause of errors and fix them.

You can debug your Talend Jobs in the following ways.
  • In the Traces Debug mode
  • In the Java Debug mode
  • Using the Log4j feature
  • Using the tLogRow component
  • Using the tJavaRow component

This article introduces the above debugging ways.

The Traces Debug mode

In the Traces Debug mode, Talend Studio provides the capability to monitor the data row-by-row as it flows between different components in a subJob. You can choose to activate or deactivate Traces or decide what processed columns to display in the traces table that displays on the design workspace when launching the current Job. You can either choose to monitor the whole data processing or monitor the data processing row-by-row or at a certain breakpoint.

Note: The Job concerned in the following steps is for demonstration only. You can debug your own Jobs in the same way.
To debug the Job in the Traces Debug mode:
  1. With your Job open in Talend Studio, open the Run view and then select Debug Run.
  2. Click the Traces Debug button. The Job runs, with row content displayed under each main > row link.
    Note: Click the triangle in the right part of the button and select Traces Debug if Traces Debug does not appear on the button.
You can watch the content of the rows as they flow through the subJob until it encounters an error.
If an error occurs because of bad data, you can examine the values of each field in the row that caused the error. In this case, the error reported is a null pointer exception, as shown below.
Using the Traces Debug View, you can see that the null value in the Amount2 field probably caused the error.

This is an excellent way of debugging Jobs when you are working with a small sample of data. However, debugging Jobs that handle large volumes of data in this mode is not recommended because displaying every row visually in the Studio console can slow down the execution of the Job. In this case, you can try the Java Debug mode.

The Java Debug mode

In the example given in the previous section, you can also debug the Job by looking at the Java code that Talend Studio generates, so as to locate the error returned by Java (in this case, a null pointer exception). In this example, notice the line number (780) in the first line of the error message.
By opening the Code tab in Talend Studio and go to the line number causing the error (780), you can infer from the screenshot below that one of the fields – Amount1 or Amount2 – has null data. This is especially helpful when you have a lot of fields within a component, and you want to identify which of these fields is causing the null pointer exception.
Note: This debugging method assumes that you are familiar with Java programming.

To debug the Job in the Java Debug mode:

  1. With your Job open in Talend Studio, open the Run view and then select Debug Run.
  2. Click the Java Debug button. The Job runs and Talend Studio switches to the Debug view, where a Java code view is created in the workspace. The Java code view contains the Java code of the Job generated by Talend Studio.
    Note: Click the triangle in the right part of the button and select Java Debug if Java Debug does not appear on the button.
  3. Debug the Job in the Debug view. You can set breakpoints and inspect/watch the values of the fields passing through the data flow.

The Log4j feature

Talend supports Log4j on all components, which presents a technical logging mechanism. For more information on how to enable it for all components, see Customizing log4j output level at runtime in the Talend Help Center.

After Log4j is enabled, you can set the level of logging need for the Job. Log4j provides these logging levels: Trace, Debug, Info, Warn, Error, and Fatal.
Note: To use the Log4j feature, you need to enable the feature in the Log4j pane of the Project Settings dialog box in Talend Studio.
To check the data values flowing through your subJob, set Log4jLevel to Trace. The data for all of the rows then displays in the console.
As in the previous examples, if one of the fields has a null value that leads to a null pointer exception, you'll be able to see it in the console log.

The tLogRow component

tLogRow is one of the easiest and most used components in Talend Studio. While it is mostly used for adding runtime logging information, it can also be used for debugging a Talend Studio Job.

In the Job shown below, the output from the tMap component is passed to a tLogRow component, which logs the row data in the console.
To have tLogRow logging row data in this subJob, you'll need to modify the tMap configuration by clearing the Die on error check box to make sure that the Job does not terminate on a failed row.