Apache Kafka - Import - 7.1

Talend Data Catalog Bridges

Talend Documentation Team
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
Talend Data Catalog

Bridge Requirements

This bridge:
  • requires Internet access to https://repo.maven.apache.org/maven2/ and/or other tool sites to download drivers into <TDC_HOME>/data/download/MIMB/. For more information on how to retrieve third-party drivers when the TDC server cannot access the Internet, see this article.

Bridge Specifications

Vendor Apache
Tool Name Kafka
Tool Version 1.0
Tool Web Site http://kafka.apache.org/
Supported Methodology [File System] Multi-Model, Data Store (NoSQL / Hierarchical, Physical Data Model) via Java API on Kafka File
Incremental Harvesting
Remote Repository Browsing for Model Selection
Data Profiling
Multi-Model Harvesting

Import tool: Apache Kafka 1.0 (http://kafka.apache.org/)
Import interface: [File System] Multi-Model, Data Store (NoSQL / Hierarchical, Physical Data Model) via Java API on Kafka File from Apache Kafka
Import bridge: 'ApacheKafka' 10.1.0

This bridge requires internet access to https://repo.maven.apache.org/maven2/ (and exceptionally a few other tool sites)
in order to download the necessary third party software libraries into $HOME/data/download/MIMB/
(such directory can be copied from another MIMB server with internet access).
By running this bridge, you hereby acknowledge responsibility for the license terms and any potential security vulnerabilities from these downloaded third party software libraries.


Loads metadata from all or specified Kafka topics. Each topic is assumed to have messages of the same type.
The bridge samples multiple latest messages to figure our their common structure.
This bridge supports the following message formats:
- Delimited File (CSV)
- Open Office Excel (XSLX)
- COBOL Copybook
- JSON (JavaScript Object Notation)
- Apache Avro
- Apache Parquet
- Apache ORC

as well as the compressed versions of the above formats:
- ZIP (as a compression format, not as archive format)
- LZ4
- Snappy (as standard Snappy format, not as Hadoop native Snappy format)

Known issue: When you run both Kafka cluster (server) version 1.1.x and the bridge (client) on Windows systems the import could fail with a timeout error. The Kafka version 2.0.x resolved the issue.

When you are connectin to Kafka using PLAIN authentication you need to specify 'JAAS configuration path' and leave empty 'Kafka brokers principal name' parameter.
When you are connectin to Kafka using the KERBEROS authentication you should specify specify values of both parameters.
When you are connectin to Kafka without authentication you need leave both of these parameters empty.

Please refer to the individual parameter's documentation for more details.

Bridge Parameters

Parameter Name Description Type Values Default Scope
Driver version Choose driver version according to Kafka API :
The version is used to load the necessary version-specific libraries.
Bootstrap servers List of host:port pairs to use for establishing the initial connection to the Kafka cluster, and finding available servers and topics.
For example, 'host1:port1, host2:port2'.
The list does not need to include all available servers but should have at least one.
You may want to include more than one server in case any of them are down.
STRING   localhost:9092 Mandatory
Topics List of topic names, like 'topic1, topic2'.
An empty list means that all available topics.
You can specify topic names as a wieldcard pattern: 'topic?' , '*topic*' or 'topic_?,*topic*'.
Number of sample messages The maximum number of messages to sample from topics. These messages are used to identify topic format details, like field names and data types.
When empty, the number of sample messages is assumed to be 1000.
STRING   1000  
Use SSL protocol to connect Set this parameter to True when the Kafka consumer uses TLS/SSL to encrypt Kafka's network traffic.

Kafka uses SSL to encrypt connections between the server and clients
Truststore file The location of the trust store file.
If it is empty the bridge would try to locate it in 'java.home'\lib\security\{'jssecacerts'|'cacerts'}
FILE *.*    
Password of the truststore Password of the truststore. PASSWORD      
JAAS configuration path Enter the path, or browse to the JAAS configuration file to be used by the bridge to authenticate as a client to Kafka. This JAAS file describes how the clients, can connect to the Kafka broker nodes, using either the kinit mode or the keytab mode. It must be stored in the machine where these bridge are executed. MITI does not provide this JAAS file. You need to create it by following the explanation in Configuring Kafka client depending on the security strategy of your organization.
This value going to JVM -Djava.security.auth.login.config=value
FILE *.*    
Kafka brokers principal name Enter the primary part of the Kerberos principal you defined for the brokers when you were creating the broker cluster. For example, in this principal kafka/kafka1.hostname.com@EXAMPLE.COM, the primary part to be used to fill in this field is kafka.
This value is going to Kafka property: sasl.kerberos.service.name=value
kinit command path Kerberos uses a default path to its kinit executable. If you have changed this path, select this check box and enter the custom access path. If you leave this check box clear, the default path is used.
This value is going to Kafka property: sasl.kerberos.kinit.cmd=value
Kerberos configuration path Kerberos uses a default path to its configuration file, the krb5.conf file (or krb5.ini in Windows) for Kerberos 5 for example. If you leave this parameter clear, a given strategy is applied by Kerberos to attempt to find the configuration information it requires. For details about this strategy, see the Locating the krb5.conf Configuration File section in Kerberos requirements.
This value is going to JVM -Djava.security.krb5.conf=value
FILE *.*    
Miscellaneous Specify miscellaneous options. STRING      


Bridge Mapping

Mapping information is not available