Different profiling results when running column analyses with the Java and the SQL engines - Cloud - 7.3

Talend Studio User Guide

Version
Cloud
7.3
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development
Available in...

Big Data Platform

Cloud API Services Platform

Cloud Big Data Platform

Cloud Data Fabric

Cloud Data Management Platform

Data Fabric

Data Management Platform

Data Services Platform

MDM Platform

Real-Time Big Data Platform

The profiling results of column analyses that use the Week Frequency and Week Low Frequency indicators may be different between the Java and the SQL engines.
Environment

Talend Open Studio for Data Quality and all platform Studios with data quality.

Description

You may get different results when you run column analysis with Week Frequency and Week Low Frequency indicators using the Java or the SQL engine.

This is due to the fact that the date function may differ between different database systems (DBMS) or even between different installations of the same DBMS system.

Let's take the WEEK(date [,mode]) function of MySQL as an example, for further information check Date and Time Functions . This function returns the week number for date . It takes a two-arguments form that enables you to specify whether:

  • the week starts on Sunday or Monday,
  • the return value should be in the range from 0 to 53 or from 1 to 53.

With the SQL engine, Talend Studio uses the function WEEK(date) . The mode argument is omitted and thus the Studio uses the default mode value as set in the DBMS configuration, in general mode=0 , but that depends on your MySQL installation.

However, if you need to change this default behavior, you can create an UDI (User Defined Indicator) where you specify the mode you want to use in the SQL query template.

With the Java engine, Talend Studio uses the parameter Locale.getDefault() to know about the above-listed two arguments and gets the results from the Java API. This means it uses the locale of Talend Studio and not the locale of the DBMS.

This explains why you may get different profiling result between the Java and the SQL engines on (Low) Week Frequency indicator and other date functions.

Related Jira Issues

DOCT-4674