Verifying default values in a retrieved schema

EnrichVersion
6.5
EnrichProdName
Talend Big Data Platform
Talend Big Data
Talend Data Integration
Talend Data Services Platform
Talend ESB
Talend Open Studio for Data Integration
Talend Data Fabric
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
Talend MDM Platform
Talend Data Management Platform
Talend Open Studio for Big Data
task
Data Quality and Preparation > Third-party systems > Database components
Design and Development > Third-party systems > Database components
Data Governance > Third-party systems > Database components
EnrichPlatform
Talend Studio

Verifying default values in a retrieved schema

This article helps you avoid syntax errors that might happen by reusing default values of a schema retrieved from a database table.

When retrieving the schema of a database table, Talend Studio automatically retrieves the default value of each field and handles their types accordingly. For example, the Studio encloses a string value within single quotation marks but leaves a function or an expression as it is. However, it might happen that a function or expression from a given database fails to be recognized, thus is treated as a string and enclosed within single quotation marks. In this situation, a syntax error or an unexpected execution result will be returned when you are using that function or expression to create database tables.

Environment

You may encounter this kind of errors in all versions of Talend Studio.

Symptoms/Description

The following example demonstrates this kind of syntax error issue.

In an Oracle database, there is a table called T1:

CREATE TABLE T1
 (    ID NUMBER(5,0) NOT NULL ENABLE,
      DESCRIPTION VARCHAR2(50) DEFAULT 'no_description',
      CREATE_DATE DATE DEFAULT sysdate,
      CREATE_BY VARCHAR2(10) DEFAULT user
 )

In this table, the data type of the CREATE_DATE column is DATE and the default value is set to get the current system date. The data type of the CREATE_BY column is VARCHAR2 and the default value is set to get the logged author.

When the table schema is retrieved into Talend Studio, the result is as follows:

The default values of the CREATE_DATE and the CREATE_BY columns are both misinterpreted as String and enclosed within single quotation marks. Although this misinterpretation does not affect reading data from or writing data into the table, it will lead to unexpected result when you use this retrieved schema to create a table with the tCreateTable component or a database output component such as tOracleOutput in a Job, because this misinterpretation makes the table structure in the CREATE statement different from the original true one.

Use the following Job to illustrate more details. This Job reflects a general data integration process that migrates a table from one database to another database and creates the target table with the same name using the tOracleOutput component. Both the input and output components are using the retrieved schema mentioned above.

In the generated code of the Job, the CREATE statement generated from the schema of the tOracleOutput component reads:

stmtCreate_tOracleOutput_1
        .execute("CREATE TABLE "
                + tableName_tOracleOutput_1
                + "(ID NUMBER(5,0)   not null ,DESCRIPTION VARCHAR2(50)  default 'no_description' ,CREATE_DATE DATE default 'sysdate' ,CREATE_BY VARCHAR2(10)  default 'user' )");

The default values of the CREATE_DATE and the CREATE_BY columns are already different from those of the original table. The inaccurate default values in this statement will lead to either failure of the Job execution or generation of unexpected values for these two columns.

Resolution

Talend Studio retrieves all the table information and the default values when retrieving a table schema. However, from a technical point of view, it is not always possible to know whether a default value is a function or a constant in every case. In the example described above, the default value of the CREATE_BY column in the original table is a function reading "user" that returns any user being logged, while because of the single quotation marks in the generated statement, the string "user" will be returned instead during the execution.

To avoid this problem, you need to verify by yourself that a function or expression default value has been properly handled after being retrieved. If it is not, you must remove manually the single quotation marks. In this example, you have to remove the single quotation marks around the default values of the CREATE_DATE and the CREATE_BY columns.