Additional information about conversions, types and formats - Cloud

Talend Cloud Pipeline Designer Processors Guide

author
Talend Documentation Team
EnrichVersion
Cloud
EnrichProdName
Talend Cloud
task
Design and Development > Designing Pipelines
EnrichPlatform
Talend Pipeline Designer

The Type Converter processor allows you to apply multiple conversion operations to an incoming record.

Source and destination types

The incoming record is Avro, so the source type is guaranteed to be representable using one of the following Avro types:
  • Primitives; null, boolean, int, long, float, double, bytes, string

  • Complex types: record, enum, array, map, union, fixed

  • Logical types (can always be safely considered as the underlying primitive/complex type): date:int, time-millis:int, time-micros:long, timestamp-millis:long, timestamp-micros:long, duration:fixed(12), decimal:fixed|bytes

The destination types were chosen to be familiar to the end user and correspond closely with the Avro types:
  • Boolean, Integer, Long, Float, Double, String, Date, Time:time-millis, and DateTime:timestamp-millis.

Errors and warnings

All errors kill the pipeline. The user is responsible for ensuring that their data is compatible with the conversion. The types of errors you may encounter include:
  • Parsing exceptions with bad DateFormat/DecimalFormat patterns.

  • Any source causes exception from bad parse or valueOf conversions.

  • Not enough source bytes to create a destination value.

Date-oriented formats

When either the source or destination value is a date/time-oriented value AND the other is a string, the format is used in the conversion, as described in the DateTimeFormatter documentation. If no format is present, the default ISO 8601 format provided with Java is used.

DateTime includes both calendar day and time information.

Warning: The Avro date/time LogicalTypes do not include time zone information, so this must be optional in the format, and will not be present in the String. The examples below include time zones for illustration.

Format

String

EEE, MMM d, ''yy 'at' h:mm a

Tue, Nov 28, '17 at 12:44 PM

yyyyy.MMMM.dd GGG hh:mm a

02017.November.28 AD 12:44 PM

Date and Time use the same formatting rules, with the following rules:
  • No field smaller than a day should appear in a Date format. There is no "hour" in the Date type: yyyy-MM-dd

  • No field larger than an hour should appear in a Time format. There is no "day" in the Time format: HH:mm:ss.SSS

Number formats

When either the source or destination value is a numeric value AND the other is a string, the format is used in the conversion, as described at [NumberFormat][NumberFormat]. If no format is present, the string is parsed using the default Java numeric values.

Format

String

'#'#

#1, #12345, #-123

$#,##0.00;($#,##0.00)

$1,234.56, $0.50, ($1.00), ($1,234.56)

Some logical rules apply to the conversions:

  • Integer and Long formats that include a decimal point will cause an error, for example.

Examples

  • : Widening primitive conversions where no information is lost.

  • : Primitive conversions (widening or narrowing) where information might be lost.

  • : The DateFormat pattern, if present, is used for String conversions with date/time types.
    • If the source is a logical type date, time-millis, timestamp-millis (time-micros and timestamp-micros are treated as long) or the destination is Date, Time or DateTime.

    • If no pattern is present, Date/Time/DateTime types use specific ISO-8601 patterns.
      • Date: yyyy-MM-dd
      • Time: HH:mm:ss
      • DateTime: yyyy-MM-dd'T'HH:mm:ss'Z'
  • : The DecimalFormat pattern, if present, is used for String conversions with numeric types. If not present, fall back to Integer.valueOf() or Integer.toString() (with the appropriate destination value).

  • When converting between supported date-oriented types and numbers, the format isn't used.
    • Date: the incoming/outgoing number is the number of days since 1970-01-01 (int)

    • Time: the incoming/outgoing number is the number of milliseconds since 00:00:00 (int)

    • DateTime: the incoming/outgoing number is the number of milliseconds since 1970-01-01 00:00:00 (long)

  • When the source and destination are supported date-oriented types and numbers, the date and time components are kept consistent between the two. Anything unknown is set relative to 1970-01-01 00:00:00. For example, converting a Time (with no date component) to Date will always return 1970-01-01.

For more information, see the Oracle documentation.

Source type (Avro)

Source value

Format

Destination type

Destination value

int

12345

-

Long

12345L (widening conversion does not lose anything)

long

12345L

-

Integer

12345 (narrowing conversions can be OK, usually on data with few significant digits)

long

1234567890123456789L

-

Integer

2112454933 (narrowing conversions can lose data, but in a well-defined way. In this case, the last four bytes of the long were reinterpreted as an int)

long

1234567890123456789L

-

Double

1234567890123456770.0d (some widening conversions can lose precision in a well-defined way)

long

0x8000000000000000L(MIN_VALUE)

-

Integer

0 (narrowing conversion uses the last four bytes)

string

"1234.5"

-

Integer

Error -- Cannot parse floating point without a format.

string

"1234.5"

#

Integer

1234 (the format discards after the decimal point)

string

"1234.5"

#.#

Integer

1234 (even a format with a decimal point helps convert the input string into a number)

boolean

false

-

Integer

0

boolean

true

-

Integer

1

boolean

false

-

Date

1970-01-01 (zero days since 1970-01-01)

boolean

true

-

Date

1970-01-02 (one day since 1970-01-01)

boolean

false

-

Time

00:00:00.000 (zero milliseconds since midnight)

boolean

true

-

Time

00:00:00.001 (one milliseconds since midnight, note that if your view does not show milliseconds, this will look exactly like false even though the underlying data is different)

timestamp-millis

2017-11-28T12:44:22Z

yyyyMMdd

String

20171128
Note: The conversion timestamp-millis > String does not work on Test datasets.

String

20171128

yyyyMMdd

timestamp-millis

2017-11-28T00:00:00Z (hours, minutes and seconds are 0)

String

"20171128"

yyyyMMdd

Date

2017-11-28

int

20171128

-

Date

+57196-09-03 (20,171,128 days after 1970-01-01)

time-millis

12:44:22

-

DateTime

1970-01-01T12:44:22Z(since there is no date part in the source time, 1970-01-01 is used)

timestamp-millis

2017-11-28T12:44:22Z

-

Date

2017-11-28 (the time component is removed, the underlying number is changed from 1511873062123L to 17498)

Note: The Int to String conversion is not supported yet by Talend. See this known issue for more information.