Using the Java or the SQL engine - Cloud - 8.0

Talend Studio User Guide

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development
Last publication date
2024-02-29
Available in...

Big Data Platform

Cloud API Services Platform

Cloud Big Data Platform

Cloud Data Fabric

Cloud Data Management Platform

Data Fabric

Data Management Platform

Data Services Platform

MDM Platform

Real-Time Big Data Platform

About this task

After setting the analysis parameters in the analysis editor, you can use either the Java or the SQL engine to execute your analysis.

The choice of the engine can sometimes slightly change analysis results, for example when you select the summary statistics indicators to profile a DB2 database. This is because indicators are computed differently depending on the database type, and also because Talend uses special functions when working with Java.

SQL engine:

If you use the SQL engine to execute a column analysis:

  • An SQL query is generated for each indicator used in the column analysis, the analysis runs multiple indicators in parallel and results are refreshed in the charts while the analysis is still in progress.

  • Data monitoring and processing are carried on the DBMS.

  • Only statistical results are retrieved locally.

By using this engine, you guarantee system better performance. You can also access valid/invalid data in the data explorer.

Java engine:

If you use the Java engine to execute a column analysis:

  • Only one query is generated for all indicators used in the column analysis,

  • All monitored data are retrieved locally to be analyzed,

  • You can set the parameters to decide whether to access the analyzed data and how many data rows to show per indicator. This will help to avoid memory limitation issues since it is impossible to store all analyzed data.

When you execute the column analysis with the Java engine, you do not need different query templates specific for each database. However, system performance is significantly reduced in comparison with the SQL engine. Executing the analysis with the Java engine uses disk space as all data is retrieved and stored locally. If you want to free up some space, you may delete the data stored in the main Talend Studio directory, at Talend-Studio>workspace>project_name>Work_MapDB.

To set the parameters to access analyzed data when using the Java engine, do the following:

Procedure

  1. In the Analysis Parameter section of the column analysis editor, select Java from the Execution engine list.
  2. Select the Allow drill down check box to store locally the data that will be analyzed by the current analysis.
    This check box is usually selected by default.
  3. In the Max number of rows kept per indicator field enter the number of the data rows you want to make accessible.
    This field is set to 50 by default.

Results

You can now run your analysis and then have access to the analyzed data according to the set parameters.