Kafka and AVRO in a Job

In a Talend Job, the Kafka components (the regular Kafka components) and the Kafka components for AVRO handle AVRO data differently, as is reflected in the approaches AVRO provides to (de)serialize the data of AVRO format.

The regular Kafka components read and write JSON and AVRO formats. If your Kafka produces or consumes AVRO data, you can use tKafkaInput and tKafkaOutput with Producer and Consumer records along with schema registry in your Standard Job.
The Kafka components in the Spark framework handle data directly in the AVRO format. If your Kafka cluster produces and consumes AVRO data, you can use tKafkaInputAvro to read data directly from Kafka and tWriteAvroFields to send AVRO data to tKafkaOutput.
However, these components do not handle the AVRO data created by an avro-tools library, because the avro-tools libraries and the components for AVRO do not use the same approach provided by AVRO.

The two approaches AVRO provides to (de)serialize the data of AVRO format are as follows:

AVRO files are generated with the embedded AVRO schema in each file (via org.apache.avro.file.{DataFileWriter/DataFileReader}). The avro-tools libraries use this approach.
AVRO records are generated without embedding the schema in each record (via org.apache.avro.io.{BinaryEncoder/BinaryDecoder}). The Kafka components for AVRO use this approach.
This approach is highly recommended and favored when AVRO encoded messages are constantly written to a Kafka topic, because in this approach, no overhead is incurred to re-embed the AVRO schema in every single message. This is a significant advantage over the other approach when using Spark Streaming to read data from or write data to Kafka, since records (messages) are usually small while the size of the AVRO schema is relatively large, so embedding the schema in each message is not cost-effective.

The outputs of the two approaches cannot be mixed in the same read-write process.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!

Leave your feedback here