Talend Data Preparation in cluster mode

Talend Real-time Big Data Platform Installation Guide for Linux

EnrichVersion
6.3
EnrichProdName
Talend Real-Time Big Data Platform
task
Installation and Upgrade

You can install several instances of Talend Data Preparation in cluster mode if you want to benefit from a high availability and a better scalability with your product.

Clustering is the process of grouping together a set of similar physical systems in order to ensure a level of operational continuity and minimize the risk of unplanned downtime, in particular by taking advantage of load balancing and failover features.

This documentation provides the procedures to set up a cluster for Talend Data Preparation, and optionally Talend Dictionary Service.

Architecture of Talend Data Preparation in cluster mode

The following diagram illustrates the architecture behind Talend Data Preparation and Talend Dictionary Service when set up in cluster mode.

This architecture is composed of several functional blocks:

  • A Load Balancer, that distributes the workload from the different users accessing the Talend Data Preparation Web application at the same time as well as the Talend Dictionary Service server.

  • The Talend Data Preparation instances, connected by a Network File System or any shared folder available to all the Talend Data Preparation instances.

  • The Talend Dictionary Service instances that you can optionally install if you want to add, remove, or edit the semantic types used on data in Talend Data Preparation.

  • A block containing the various components necessary for Talend Data Preparation and Talend Dictionary Service to work, namely several instances of MongoDB for storage, Kafka and Zookeeper for messaging, and an instance of Talend Administration Center to manage authorizations.

Installing Talend Data Preparation in cluster mode

To install Talend Data Preparation in cluster mode, you need to make some modifications in the <Data_Preparation_Path>/config/application.properties configuration file.

To perform this installation, you need to install and configure as many instances of Talend Data Preparation and its dependencies as necessary.

Prerequisites:

  • You have configured a Load Balancer for each module, namely Talend Data Preparation and optionally Talend Dictionary Service.

  • You have configured MongoDB in cluster mode. For more information, see MongoDB documentation.

  • You have configured Kafka and Zookeeper in cluster mode. For more information, see Zookeeper documentation and Kafka documentation.

  1. Install a first Talend Data Preparation instance.

    For more information on the Talend Data Preparation installation procedure, see Installing and configuring Talend Data Preparation .

  2. In the <Data_Preparation_Path>/config/application.properties file, edit the mongodb.host property to specify the hosts and ports of the several MongoDB instances.

    Use the following syntax:

    spring.data.mongodb.host=<host1>:<port1>,<host2>:<port2>,…,<hostN>

    The hosts and ports for the different URLs must be concatenated, except for the last host, that will inherit the value of the mongodb.port property. For example:

    mongodb.host=mongorep-mongodb-replica-1.mongorep-mongodb-replica.default.svc.cluster.local:27017,
    mongorep-mongodb-replica-0.mongorep-mongodb-replica.default.svc.cluster.local:27017,
    mongorep-mongodb-replica-2.mongorep-mongodb-replica.default.svc.cluster.local:27017,
    mongorep-mongodb-replica-3.mongorep-mongodb-replica.default.svc.cluster.local
    mongodb.port=27017
  3. Edit the service.cache.file.location and dataset.content.store.file.location properties to specify the location of your Network File System, or shared folder that must be available to all the Talend Data Preparation instances. For example:

    service.cache.file.location=sharedContent/
    dataset.content.store.file.location=sharedContent/store/datasets/content/

  4. If you want to use Talend Data Preparation with Talend Dictionary Service to add, edit, or remove semantic types, edit the properties specifying the hosts and ports for the Kafka and Zookeeper instances.

    In the same way as the MongoDB URLs, the Kafka and Zookeeper hosts and ports must be concatenated, except for the last port, that is inherited from the dedicated properties.

    spring.cloud.stream.kafka.binder.brokers=host1:9092,host2:9092,host3
    spring.cloud.stream.kafka.binder.zkNodes=host1:2181,host2:2181,host3
    spring.cloud.stream.kafka.binder.defaultBrokerPort=9092
    spring.cloud.stream.kafka.binder.defaultZkPort=2181
  5. To increase the session duration and reduce the risk of unexpected logouts, add the following lines:

    security.token.renew-after=600
    security.token.invalid-after=3600
  6. Repeat this installation and configuration procedure for each instances of Talend Data Preparation that you want to install.

The several Talend Data Preparation instances have been installed and configured to work in cluster mode.

Installing Talend Dictionary Service in cluster mode

You can optionally install Talend Dictionary Service in cluster mode, to add, remove, or edit the semantic types used on data in Talend Data Stewardship.

To install Talend Dictionary Service in cluster mode, you need to modify the <Tomcat>/conf/data-quality.properties configuration file.

To perform this installation, you need to install and configure as many instances of Talend Dictionary Service, and its dependencies, as necessary.

Prerequisites:

  1. Install a first Talend Dictionary Service instance.

    For more information on the installation procedure, see Installing and configuring Talend Dictionary Service .

  2. In the <Tomcat>/conf/data-quality.properties file, edit the mongodb.host property to specify the hosts and ports of the several MongoDB instances.

    Use the following syntax:

    mongodb.host=<host1>:<port1>,<host2>:<port2>,…,<hostN>

    The hosts and ports for the different URLs must be concatenated, except for the last host, that will inherit the value of the mongodb.port property. For example:

    mongodb.host=mongorep-mongodb-replica-1.mongorep-mongodbreplica.
    default.svc.cluster.local:27017,
    mongorep-mongodb-replica-0.mongorep-mongodbreplica.
    default.svc.cluster.local:27017,
    mongorep-mongodb-replica-2.mongorep-mongodbreplica.
    default.svc.cluster.local:27017,
    mongorep-mongodb-replica-3.mongorep-mongodbreplica.
    default.svc.cluster.local
    mongodb.port=27017
  3. Edit the properties specifying the hosts and ports for the Kafka and Zookeeper instances.

    In the same way as the MongoDB URLs, the Kafka and Zookeeper hosts and ports must be concatenated, except for the last port, that is inherited from the dedicated properties.

    spring.cloud.stream.kafka.binder.brokers=host1:9092,host2:9092,host3
    spring.cloud.stream.kafka.binder.zkNodes=host1:2181,host2:2181,host3
    spring.cloud.stream.kafka.binder.defaultBrokerPort=9092
    spring.cloud.stream.kafka.binder.defaultZkPort=2181
  4. Repeat this installation and configuration procedure for each instance of Talend Dictionary Service that you want to install.

You have installed several Talend Dictionary Service instances and configured them to work in cluster mode.

Talend Data Preparation cluster mode limitations

When Talend Data Preparation is installed in cluster mode, unexpected logouts from the interface may occasionally happen, even if the risk is minimal. See the corresponding Jira ticket: https://jira.talendforge.org/browse/TDP-3699.