COBOL Considerations - 7.3

Talend Data Mapper User Guide

Version
7.3
Language
English
Product
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development > Designing Jobs
Last publication date
2023-01-05

While it is possible to work with COBOL files using the specific COBOL representation, they can also be handled through the flat representation. This section describes some factors to take into account when working with COBOL files using the flat representation.

The properties for the COBOL importer are:
  • Character Encoding - Specify the character encoding for the data to be processed. For IBM COBOL this will typically be some EBCDIC encoding, such as CP037 or IBM037. For other environments, it's likely to be the more standard ASCII (UTF-8) encoding. See Character encoding for more information.

  • Copybook format - Specifies the format of the copybook source. If there are no sequence numbers, use the Free form option. If there are sequence numbers, use the column specifications to indicate where the actual source starts (past the sequence number and continuation character). The standard column values are the default.

  • Is each record separated by a newline character? - Specify this in the (relatively rare case) where each record is separated by a newline that is not specified in the COBOL definition. That is the data is a combination of a bunch of positional records, each separated by a newline character. If you use this, the Record element in the generated structure will have a newline as the element terminator. This can be changed to another sequence of characters if desired.

  • Should reference structures be created alongside the main structure? - If you select this checkbox, additional Structures are created called reference structures. The main structure inherits from the reference structure. There is one reference structure for the entire copybook and other reference structures for the REDEFINES (choices). In the REDEFINES, each alternative becomes an individual reference structure.

The COBOL importer generates structures for each top-level record definition in the copybook being imported.

The following table shows how elementary COBOL data items are mapped to data types based on their USAGE and PICTURE clauses. For simplicity, this table uses the following abbreviations:
  • BINARY - An item with COBOL USAGE BINARY, COMP, COMP-4 or COMP-5.

  • PACKED-DECIMAL - An item with COBOL USAGE COMP-3 or PACKED-DECIMAL.

  • ZONED-DECIMAL - An item with COBOL USAGE DISPLAY, whose PICTURE clause only contains the 9, V, S or P symbols.

 

Data Type

Data Format

BINARY, signed, totalDigits < 5

Short (16)

-

BINARY, unsigned, totalDigits < 5

Unsigned Short (16)

-

BINARY, signed, totalDigits < 10

Integer (32)

-

BINARY, unsigned, totalDigits < 10

Unsigned Integer (32)

-

BINARY, totalDigits < 19

Long (64)

-

PACKED-DECIMAL, signed

Decimal

DF_DEC_PACKED_SIGNED

PACKED-DECIMAL, unsigned

Decimal

DF_DEC_PACKED

COMP-1

Float (32)

-

COMP-2

Double (64)

-

ZONED-DECIMAL, signed, SIGN LEADING SEPARATE

Decimal

DF_DEC_ZONED_LEADING_SEP

ZONED-DECIMAL, signed, SIGN LEADING

Decimal

DF_DEC_ZONED_LEADING

ZONED-DECIMAL, signed, SIGN TRAILING SEPARATE

Decimal

DF_DEC_ZONED_TRAILING_SEP

ZONED-DECIMAL, signed, SIGN TRAILING

Decimal

DF_DEC_ZONED

ZONED-DECIMAL, unsigned

Decimal

-

DISPLAY, BLANK WHEN ZERO

Decimal

DF_DEC_BWZ

DISPLAY, other PICTURE symbols

String

-
The COBOL importer supports the following COBOL features:
  • Numeric Scaling - The implied decimal character V in the PICTURE clause is implemented using the Decimal Places property of the element.

  • Level 88 - Level 88 clauses are supported using an Element Type of Value for the element (like for code values). The name of the Level 88 clause is included as the description of the value element.

  • OCCURS DEPENDING ON - This will set the element's occurs minimum and maximum times based on the range of occurrences specified. To implement the DEPENDING ON portion, use a FixedLoop

  • REDEFINES - The REDEFINES clause is implemented using the Group Type of Choice. The Group Type of parent element of the element containing the function taking the value of the element that the occurs depends on.REDEFINES clause is set to Choice. Each element with a REDEFINES clause is a branch of the choice. See below for restrictions on this.

    You can use the IsPresent expression to define the condition used to determine which of REDEFINES (member of the Choice) is to be available when reading the input. By default, a Constant IsPresent expression is generated for each Choice member, the first member getting the value true and the rest false. This makes it easy to change if you want to unconditionally select a different member.

  • REDEFINES Record Types - When records are redefined using REDEFINES, for each member of the Choice (see above), if the first field has a single 88 level constant, a delimited initiator will be generated for the field and the length of the field will be reduced by the size of the 88 level constant (typically to zero). This allows the flat reader and writer to automatically consume or generate these providing the correct record.

  • Binary vs. Character - Newline function taking the value of the element - If all of the encoding for the data is determined to be character, the detection of a newline that separates the records is optionally (and by default) added to the structure during the import. Be sure that the newline character in the representation properties is correct for the type of data you are importing. By default the newline character will be set to a line feed character.

Here are some issues and limitations on the COBOL copybook import:
  • Alignment or SYNCHRONIZED - No adjustment is made when SYNCHRONIZED values are encountered. Also no attempt is made to align binary data as may be required for certain architectures. These may need to be adjusted to be aligned to the nearest 32-bit boundary in the record. However, this depends on the compiler and platform that was used. If an adjustment is required, you must do it manually by adding the appropriate filler.

  • Level 88 for Non-leaf Elements - The level 88 values are not supported for non-leaf elements. You will get a warning and they will be ignored.

  • REDEFINES Padding - No additional padding is added to any of the element subtrees that are involved with a REDEFINES. If you are dealing with positional fixed length records with no line delimiters, you will need to make sure the appropriate padding is provided (usually using FILLER declarations) to make things line up.