Troubleshooting common Talend data type issues
This article explains how to avoid common issues when working with Talend data types.
For more information about data type mapping in Talend Studio, see Type mapping.
Regular data types
Numbers
The Java and database vendors offer different numeric implementations. The JDBC driver should make the transition without any data loss.
However, there are types where this is not necessarily possible. Float or double floating point numbers cannot hold precise values. Mathematical operations on these numbers will yield different results based on the operation order. A good practice is to summarize the small numbers then add the bigger ones.
Another common mistake is when using globalMap
. You need to use the
Class instead of the primitive.
int
>Integer
double
>Double
When converting a String to a number, you could use the following:
Integer.valueOf
/Integer.parseInt
Double.valueOf
/Double.parseDouble
Date
java.util.Date
and java.sql.Date
hold
timestamps. The tLogRow component displays these timestamps based on the pattern you
configure. If the pattern only contains YYYY
, the underlying data
object will still contain more information. Timezones are one example of a piece of
information that is often overlooked, which can result in dates moving when you don't
expect them to.
For example, one database returns a date as 2021-03-10 Midnight
, but the
other database treats it as 2021-03-10 Midnight in Central Europe
GMT+2
, then converts it to UTC to store it and transforms it into a date type
which results in 2021-03-09
after ingestion. This is often seen as a
bug, but there is a logical explanation.
The database stores the object as 2021-03-10
.
This goes to Java, where it becomes (depending on the conversion used):
2021-03-10 00:00:00.000 UTC
2021-03-10 02:00:00.000 GMT+2
Then it goes to the database, which treats it as:
2021-03-10 00:00:00.000 GMT+2
2021-03-09 22:00:00.000 UTC
The conversion to UTC will cause the date to move up a day. Because of this, when you have date-related issues, you should always analyze the entire date, including the timezone, hours, and seconds.
Precision
Since Talend
uses java.util.Date
, the micro and nanosecond precision is not
available. However, some components might check if the incoming Object type is
java.sql.Timestamp
. In that case, the nanoseconds are available
as well. To use this, you need to select the Object type in the schema definition,
and the source and target components need to support it.
Experiments worth execution
You can try the following ways to troubleshoot any issue:
- Use a tFileOutput component to dump and read these values. This way, you can control the timezone better.
- Use the
-Duser.timezone
parameter to see if this changes the behavior. For more information, see Time Zone Settings in the JRE. - Consult the JDBC driver documentation to learn how to handle these timezones.
- Use a tLogRow component with the following pattern and
result:
"yyyy-MM-dd'T'HH:mm:ss.SSSXXX" 2001-07-04T12:08:56.235-07:00
Then the target database may or may not be able to store the timestamp part of the previous examples, which is combined with the timezone. This could lead to days shifting.
Possible workarounds
Use Strings and see if the databases or their driver can handle the transformation. With String, what you see is what you get, and there is not extra hidden information.
Convert between timezones using Java code in a tJavaRow component:
String pattern = "yyyy-MM-dd";
log.info(TalendDate.formatDate(pattern, input_row.startDate));
output_row.startDateUTC = TalendDate.parseDateInUTC("yyyy-MM-dd zzz",
TalendDate.formatDate("yyyy-MM-dd", input_row.startDate)+" UTC");
For more information about data behavior, see Your Calendrical Fallacy Is....
byte[] / binary
This data type is most commonly used with images, BLOB, and other binary data.
However, some components might handle the binary format in Base64 encoded format
represented as Strings. The most common mistake related to this type is when Object is
used instead, which displays something like @[aaa123
in the logs,
because the Object ID instead of the Object value is returned when
.toString()
is used on an Object.
Object
Another name for this data type could be Other
. It can represent any
data type in Java that is explained above. This can be useful to represent some special
types, for example, timestamp.
However, to access all the available information, you must cast it to its original type.
Comparison
==
format.
However, other data types, such as String, should be compared using the
.equals()
or .compareTo()
functions.
Common mistakes:
context.value == "foo"
will not work.
context.value.equals("foo")
could lead to NullPointerException. However,
"foo".equals(context.value)
will result in false in case the value
is null.
Special data types
Snowflake - Geography type
Snowflake supports geography type. However, Talend does not support it out of the box.
For more information about converting Strings to geography type, see the Snowflake documentation Geospatial Data Types.
By automatically applying the TO_GEOGRAPHY function, you can create a schema and test the data using a tFixedFlowInput component.
By defining your input schema as shown above and having the geography data represented by the following String:
POINT(-122.35 37.55)
The Snowflake table is defined like this:
create table "SAMPLEGEO" (str1 text, str2 geography, int1 number);
Using the tSnowflakeOutputBulkExec component results in the following:
Notice that the Talend String type is converted to geography in Snowflake.
If you are using String or Object data types, it is always worth checking if the special or exotic data types can be loaded into the database.