tKafkaConfiguration properties for Apache Spark Streaming

These properties are used to configure tKafkaConfiguration running in the Spark Streaming Job framework.

The Standard tKafkaConfiguration component belongs to the Internet family.

The component in this framework is available in Talend Real-Time Big Data Platform and in Talend Data Fabric.

Basic settings

Broker list	Enter the addresses of the broker nodes of the Kafka cluster to be used. The form of this address should be hostname:port. This information is the name and the port of the hosting node in this Kafka cluster. If you need to specify several addresses, separate them using a comma (,).
Use SSL/TLS	Select this check box to enable the SSL or TLS encrypted connection. Then you need to use the tSetKeystore component in the same Job to specify the encryption information. This check box is available since Kafka 0.9.0.1.
Use Kerberos authentication	If the Kafka cluster to be used is secured with Kerberos, select this check box to display the related parameters to be defined: JAAS configuration path: enter the path, or browse to the JAAS configuration file to be used by the Job to authenticate as a client to Kafka. This JAAS file describes how the clients, the Kafka-related Jobs in terms of Talend , can connect to the Kafka broker nodes, using either the kinit mode or the keytab mode. The JAAS file must be stored in the machine where these Jobs are executed. Talend , Kerberos or Kafka does not provide this JAAS file. You need to create it by following the explanation in Configuring Kafka client depending on the security strategy of your organization. Kafka brokers principal name: enter the primary part of the Kerberos principal you defined for the brokers when you were creating the broker cluster. For example, in this principal kafka/kafka1.hostname.com@EXAMPLE.COM, the primary part to be used to fill in this field is kafka. Set kinit command path: Kerberos uses a default path to its kinit executable. If you have changed this path, select this check box and enter the custom access path. If you leave this check box clear, the default path is used. Set Kerberos configuration path: Kerberos uses a default path to its configuration file, the krb5.conf file (or krb5.ini in Windows) for Kerberos 5 for example. If you have changed this path, select this check box and enter the custom access path to the Kerberos configuration file. If you leave this check box clear, a given strategy is applied by Kerberos to attempt to find the configuration information it requires. For details about this strategy, see the Locating the krb5.conf Configuration File section in Kerberos requirements. For further information about how a Kafka cluster is secured with Kerberos, see Authenticating using SASL. This check box is available since Kafka 0.9.0.1.
Use Schema Registry	Select this check box to use Confluent Schema Registry and to display the related parameters to be defined: URL: enter the Schema Registry instance URL. Basic authentication: select this check box and enter your credentials in the Username and Password. Use the keystore of Kafka broker: select this check box to enable the SSL or TLS encrypted connection using the same tSetKeystore component used by the Kafka broker. This checkbox is available when you select the Use SSL/TLS check box and clear the Set schema registry keystore. Set schema registry keystore: select this check box to enable the SSL or TLS encrypted connection. Then you need to use the tSetKeystore component in the same Job to specify the encryption information. For more information about Schema Registry, see the Confluent documentation. This option is available when you have installed the 8.0.1-R2022-12 Talend Studio Monthly update or a later one delivered by Talend. For more information, check with your administrator.

Advanced settings

Connection pool	In this area, you configure, for each Spark executor, the connection pool used to control the number of connections that stay open simultaneously. The default values given to the following connection pool parameters are good enough for most use cases. Max total number of connections: enter the maximum number of connections (idle or active) that are allowed to stay open simultaneously. The default number is 8. If you enter -1, you allow unlimited number of open connections at the same time. Max waiting time (ms): enter the maximum amount of time at the end of which the response to a demand for using a connection should be returned by the connection pool. By default, it is -1, that is to say, infinite. Min number of idle connections: enter the minimum number of idle connections (connections not used) maintained in the connection pool. Max number of idle connections: enter the maximum number of idle connections (connections not used) maintained in the connection pool.
Evict connections	Select this check box to define criteria to destroy connections in the connection pool. The following fields are displayed once you have selected it. Time between two eviction runs: enter the time interval (in milliseconds) at the end of which the component checks the status of the connections and destroys the idle ones. Min idle time for a connection to be eligible to eviction: enter the time interval (in milliseconds) at the end of which the idle connections are destroyed. Soft min idle time for a connection to be eligible to eviction: this parameter works the same way as Min idle time for a connection to be eligible to eviction but it keeps the minimum number of idle connections, the number you define in the Min number of idle connections field.

Connection pool

In this area, you configure, for each Spark executor, the connection pool used to control the number of connections that stay open simultaneously. The default values given to the following connection pool parameters are good enough for most use cases.

Max total number of connections: enter the maximum number of connections (idle or active) that are allowed to stay open simultaneously.

The default number is 8. If you enter -1, you allow unlimited number of open connections at the same time.
Max waiting time (ms): enter the maximum amount of time at the end of which the response to a demand for using a connection should be returned by the connection pool. By default, it is -1, that is to say, infinite.
Min number of idle connections: enter the minimum number of idle connections (connections not used) maintained in the connection pool.
Max number of idle connections: enter the maximum number of idle connections (connections not used) maintained in the connection pool.

Evict connections

Select this check box to define criteria to destroy connections in the connection pool. The following fields are displayed once you have selected it.

Time between two eviction runs: enter the time interval (in milliseconds) at the end of which the component checks the status of the connections and destroys the idle ones.
Min idle time for a connection to be eligible to eviction: enter the time interval (in milliseconds) at the end of which the idle connections are destroyed.
Soft min idle time for a connection to be eligible to eviction: this parameter works the same way as Min idle time for a connection to be eligible to eviction but it keeps the minimum number of idle connections, the number you define in the Min number of idle connections field.

Usage

Usage rule	This component is used standalone to create the Kafka connection that the other Kafka components can reuse.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!

Leave your feedback here