Kafka JSON schema and limitations - Cloud

Talend Cloud Apps Connectors Guide

Version
Cloud
Language
English
Product
Talend Cloud
Module
Talend Data Inventory
Talend Data Preparation
Talend Pipeline Designer
Content
Administration and Monitoring > Managing connections
Design and Development > Designing Pipelines
Last publication date
2024-03-21

When creating a Kafka dataset, you have the possibility to enter a custom JSON schema which is then used when reading/writing from the selected topic.

Caveats for working with JSON and Kafka input

The current implementation of JSON support in Kafka works as follows:

  • The schema is inferred from the first JSON record, this schema is then used to convert subsequent JSON records.
  • If a JSON record does not match the inferred JSON schema, it is dropped silently (with a debug message).
Example of a Kafka topic with the following JSON records:
1 - {"title":"The Matrix","year":1999,"cast":["Keanu Reeves","Laurence Fishburne","Carrie-Anne Moss","Hugo Weaving","Joe Pantoliano"],"genres":["Science Fiction"]}
2 - {"Test" : true}
3 - {"title":"Toy Story","year":1995,"cast":["Tim Allen","Tom Hanks","(voices)"],"genres":["Animated"]}
The Kafka input connector will handle the messages like this:
  • Infer the schema from the first incoming JSON record (message number 1).
  • Forward message number 1 to the next connector.
  • Drop message number 2 as it does not match the inferred schema.
  • Forward message number 3 to the next connector as it matches the inferred schema.

Caveats for working with JSON and Kafka output

The Kafka output connector cannot handle properly the Bytes type.