Additional information about conversions, types, and formats - Cloud

Talend Cloud Pipeline Designer Processors Guide

Version
Cloud
Language
English
Product
Talend Cloud
Module
Talend Pipeline Designer
Content
Design and Development > Designing Pipelines
Last publication date
2024-02-26

The Type converter processor allows you to apply multiple conversion operations to an incoming record.

You can either convert Primitive data types or Semantic data types.

Source and destination types

The input and output records (Avro format) correspond to one of the following data types:
  • Primitive types: null, boolean, int, long, float, double, bytes, string. They also include:
    • Complex types: record, enum, array, map, union, fixed

    • Logical types: date:int, time-millis:int, time-micros:long, timestamp-millis:long, timestamp-micros:long, duration:fixed(12), decimal:fixed|bytes

  • Semantic types: they are predefined semantic types suggested by Talend Cloud when retrieving the fields of a dataset. For more information, read Managing semantic types.

Errors and warnings

All errors kill the pipeline. You need to make sure your data is compatible with the conversion. The types of errors you may encounter include:
  • Parsing exceptions with bad DateFormat/DecimalFormat patterns.

  • Any source causes exception from bad parse or valueOf conversions.

  • Not enough source bytes to create a destination value.

Date-oriented formats (for primitive types only)

When either the source or destination value is a date/time-oriented value AND the other is a string, the format is used in the conversion, as described in the DateTimeFormatter documentation. If no format is present, the default ISO 8601 format provided with Java is used.

DateTime includes both calendar day and time information.

Warning: The Avro date/time LogicalTypes do not include time zone information, so this must be optional in the format, and will not be present in the String. The examples below include time zones for illustration.

Format

String

EEE, MMM d, ''yy 'at' h:mm a

Tue, Nov 28, '17 at 12:44 PM

yyyyy.MMMM.dd GGG hh:mm a

02017.November.28 AD 12:44 PM

Date and Time use the same formatting rules, with the following rules:
  • No field smaller than a day should appear in a Date format. There is no "hour" in the Date type: yyyy-MM-dd

  • No field larger than an hour should appear in a Time format. There is no "day" in the Time format: HH:mm:ss.SSS

Number formats (for primitive types only)

When either the source or destination value is a numeric value AND the other is a string, the format is used in the conversion, as described at [NumberFormat][NumberFormat]. If no format is present, the string is parsed using the default Java numeric values.

Format

String

'#'#

#1, #12345, #-123

$#,##0.00;($#,##0.00)

$1,234.56, $0.50, ($1.00), ($1,234.56)

Some logical rules apply to the conversions:

  • Integer and Long formats that include a decimal point will cause an error, for example.

Examples

  • Primitive conversion OK: Widening primitive conversions where no information is lost.

  • Primitive conversion with error: Primitive conversions (widening or narrowing) where information might be lost.

  • Date conversion: The DateFormat pattern, if present, is used for String conversions with date/time types.
    • If the source is a logical type date, time-millis, timestamp-millis (time-micros and timestamp-micros are treated as long), or the destination is Date, Time, or DateTime.

    • If no pattern is present, Date/Time/DateTime types use specific ISO-8601 patterns.
      • Date: yyyy-MM-dd
      • Time: HH:mm:ss
      • DateTime: yyyy-MM-dd'T'HH:mm:ss'Z'
  • Numeric conversion: The DecimalFormat pattern, if present, is used for String conversions with numeric types. If not present, fall back to Integer.valueOf() or Integer.toString() (with the appropriate destination value).

  • When converting between supported date-oriented types and numbers, the format isn't used.
    • Date: the incoming/outgoing number is the number of days since 1970-01-01 (int)

    • Time: the incoming/outgoing number is the number of milliseconds since 00:00:00 (int)

    • DateTime: the incoming/outgoing number is the number of milliseconds since 1970-01-01 00:00:00 (long)

  • When the source and destination are supported date-oriented types and numbers, the date and time components are kept consistent between the two. Anything unknown is set relative to 1970-01-01 00:00:00. For example, converting a Time (with no date component) to Date will always return 1970-01-01.

For more information, see the Oracle documentation.

Source type (Avro)

Source value

Format

Destination type

Destination value

int

12345

-

Long

Primitive conversion OK 12345L (widening conversion does not lose anything)

long

12345L

-

Integer

Primitive conversion with error 12345 (narrowing conversions can be OK, usually on data with few significant digits)

long

1234567890123456789L

-

Integer

Primitive conversion with error 2112454933 (narrowing conversions can lose data, but in a well-defined way. In this case, the last four bytes of the long were reinterpreted as an int)

long

1234567890123456789L

-

Double

Primitive conversion with error 1234567890123456770.0d (some widening conversions can lose precision in a well-defined way)

long

0x8000000000000000L(MIN_VALUE)

-

Integer

0 (narrowing conversion uses the last four bytes)

string

"1234.5"

-

Integer

Error -- Cannot parse floating point without a format.

string

"1234.5"

#

Integer

Numeric conversion 1234 (the format discards after the decimal point)

string

"1234.5"

#.#

Integer

Numeric conversion 1234 (even a format with a decimal point helps convert the input string into a number)

boolean

false

-

Integer

0

boolean

true

-

Integer

1

boolean

false

-

Date

1970-01-01 (zero days since 1970-01-01)

boolean

true

-

Date

1970-01-02 (one day since 1970-01-01)

boolean

false

-

Time

00:00:00.000 (zero milliseconds since midnight)

boolean

true

-

Time

00:00:00.001 (one milliseconds since midnight, note that if your view does not show milliseconds, this will look exactly like false even though the underlying data is different)

timestamp-millis

2017-11-28T12:44:22Z

yyyyMMdd

String

Date conversion 20171128
Note: The conversion timestamp-millis > String does not work on Test datasets.

String

20171128

yyyyMMdd

timestamp-millis

Date conversion 2017-11-28T00:00:00Z (hours, minutes and seconds are 0)

String

"20171128"

yyyyMMdd

Date

Date conversion 2017-11-28

int

20171128

-

Date

+57196-09-03 (20,171,128 days after 1970-01-01)

time-millis

12:44:22

-

DateTime

1970-01-01T12:44:22Z(since there is no date part in the source time, 1970-01-01 is used)

timestamp-millis

2017-11-28T12:44:22Z

-

Date

2017-11-28 (the time component is removed, the underlying number is changed from 1511873062123L to 17498)

Note: The Int to String conversion is not supported by Talend.