Using the Java or the SQL engine - 6.5

Talend Open Studio for MDM User Guide

EnrichVersion
6.5
EnrichProdName
Talend Open Studio for MDM
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

After setting the analysis parameters in the analysis editor, you can use either the Java or the SQL engine to execute your analysis.

The choice of the engine can sometimes slightly change analysis results, for example when you select the summary statistics indicators to profile a DB2 database. This is because indicators are computed differently depending on the database type, and also because Talend uses special functions when working with Java.

SQL engine:

If you use the SQL engine to execute a column analysis:

  • an SQL query is generated for each indicator used in the column analysis, the analysis runs multiple indicators in parallel and results are refreshed in the charts while the analysis is still in progress,

  • data monitoring and processing is carried on the DBMS,

  • only statistical results are retrieved locally.

By using this engine, you guarantee system better performance. You can also access valid/invalid data in the data explorer, for more information, see Viewing and exporting analyzed data.

Java engine:

If you use the Java engine to execute a column analysis:

  • only one query is generated for all indicators used in the column analysis,

  • all monitored data is retrieved locally to be analyzed,

  • you can set the parameters to decide whether to access the analyzed data and how many data rows to show per indicator. This will help to avoid memory limitation issues since it is impossible to store all analyzed data.

When you execute the column analysis with the Java engine, you do not need different query templates specific for each database. However, system performance is significantly reduced in comparison with the SQL engine. Executing the analysis with the Java engine uses disk space as all data is retrieved and stored locally. If you want to free up some space, you may delete the data stored in the main studio directory, at Talend-Studio>workspace>project_name>Work_MapDB.

To set the parameters to access analyzed data when using the Java engine, do the following:

  1. In the Analysis Parameter view of the column analysis editor, select Java from the Execution engine list.

  2. Select the Allow drill down check box to store locally the data that will be analyzed by the current analysis.

    This check box is usually selected by default.

  3. In the Max number of rows kept per indicator field enter the number of the data rows you want to make accessible.

    This field is set to 50 by default.

You can now run your analysis and then have access to the analyzed data according to the set parameters. For more information, see Viewing and exporting analyzed data.