About this task
After setting the analysis parameters in the analysis editor, you can use either the Java or the SQL engine to execute your analysis.
The choice of the engine can sometimes slightly change analysis results, for example when you select the summary statistics indicators to profile a DB2 database. This is because indicators are computed differently depending on the database type, and also because Talend uses special functions when working with Java.
If you use the SQL engine to execute a column analysis:
an SQL query is generated for each indicator used in the column analysis, the analysis runs multiple indicators in parallel and results are refreshed in the charts while the analysis is still in progress,
data monitoring and processing is carried on the DBMS,
only statistical results are retrieved locally.
By using this engine, you guarantee system better performance. You can also access valid/invalid data in the data explorer.
If you use the Java engine to execute a column analysis:
only one query is generated for all indicators used in the column analysis,
all monitored data is retrieved locally to be analyzed,
you can set the parameters to decide whether to access the analyzed data and how many data rows to show per indicator. This will help to avoid memory limitation issues since it is impossible to store all analyzed data.
When you execute the column analysis with the Java engine, you do not need different query templates specific for each database. However, system performance is significantly reduced in comparison with the SQL engine. Executing the analysis with the Java engine uses disk space as all data is retrieved and stored locally. If you want to free up some space, you may delete the data stored in the main studio directory, at Talend-Studio>workspace>project_name>Work_MapDB.
To set the parameters to access analyzed data when using the Java engine, do the following:
In the Analysis Parameter view of the column analysis
editor, select Java from the Execution
Select the Allow drill down check box to store locally
the data that will be analyzed by the current analysis.
This check box is usually selected by default.
In the Max number of rows kept per indicator field enter
the number of the data rows you want to make accessible.
This field is set to 50 by default.
You can now run your analysis and then have access to the analyzed data according to the set parameters.