You can install several instances of Talend Data Stewardship in cluster mode if you want to benefit from a high availability and a better scalability with your product.
Clustering is the process of grouping together a set of similar physical systems in order to ensure a level of operational continuity and minimize the risk of unplanned downtime, in particular by taking advantage of load balancing and failover features.
This documentation provides the procedures to set up a cluster for Talend Data Stewardship, and optionally Talend Dictionary Service.
The following diagram illustrates the architecture behind Talend Data Stewardship and Talend Dictionary Service when set up in cluster mode.
This architecture is composed of several functional blocks:
A Load Balancer, that distributes the workload from the different users accessing the Talend Data Stewardship Web application at the same time as well as the Talend Dictionary Service server.
The Talend Data Stewardship instances.
The Talend Dictionary Service instances that you can optionally install if you want to add, remove, or edit the semantic types used on data in Talend Data Stewardship.
A block containing the various components necessary for Talend Data Stewardship and Talend Dictionary Service to work, namely several instances of MongoDB for storage, Kafka and Zookeeper for messaging, and an instance of Talend Administration Center to manage authorizations.
To install Talend Data Stewardship in cluster mode, you need to make some modifications in the <Data_Stewardship_Path>/tds/apache-tomcat/conf/data-stewardship.properties configuration file.
To perform this installation, you need to install and configure as many instances of Talend Data Stewardship and its dependencies as necessary.
You have configured a Load Balancer for each module, namely Talend Data Stewardship and optionally Talend Dictionary Service.
You have configured MongoDB in cluster mode. For more information, see MongoDB documentation.
Install a first Talend Data Stewardship instance.
For more information on the installation procedure, see Installing and configuring Talend Data Stewardship .
In the <Data_Stewardship_Path>/tds/apache-tomcat/conf/data-stewardship.properties file, edit the
mongodb.hostproperty to specify the hosts and ports of the several MongoDB instances.
Use the following syntax:
The hosts and ports for the different URLs must be concatenated, except for the last host, that will inherit the value of the
mongodb.portproperty. For example:
spring.data.mongodb.host=mongorep-mongodb-replica-1.mongorep-mongodbreplica. default.svc.cluster.local:27017, mongorep-mongodb-replica-0.mongorep-mongodbreplica. default.svc.cluster.local:27017, mongorep-mongodb-replica-2.mongorep-mongodbreplica. default.svc.cluster.local:27017, mongorep-mongodb-replica-3.mongorep-mongodbreplica. default.svc.cluster.local spring.data.mongodb.host=27017
Edit the properties specifying the hosts and ports for the Kafka and Zookeeper instances.
In the same way as the MongoDB URLs, the Kafka and Zookeeper hosts and ports must be concatenated, except for the last port, that is inherited from the dedicated properties.
talend.kafka.brokers=host1:9092,host2:9092,host3 talend.kafka.port=9092 talend.zookeeper.nodes=host1:2181,host2:2181,host3 talend.zookeeper.port=2181
Specify also the below peer port parameters which identify the host name with the port number.
To increase the session duration and reduce the risk of unexpected logouts, add the following lines:
Repeat the above steps to install and configure other instances of Talend Data Stewardship. Increment the value in the
service.instance.idparameter at <Data_Stewardship_Path>/tds/apache-tomcat/conf/data-stewardship.properties to use a unique identifier per instance.
Create partitions for Kafka topics in each Talend Data Stewardship instance:
Launch a Talend Data Stewardship instance. This automatically creates several Kafka topics.
Stop the instance and define the partitions per topics manually. You need to define as many partitions as Kafka nodes.
For further information, see Kafka documentation.
Restart the instance.
You have installed several Talend Data Stewardship instances and configured them to work in cluster mode.
When you install Talend Data Stewardship in cluster mode, unexpected logouts from the interface may occasionally happen, even if the risk is minimal. See the corresponding Jira ticket https://jira.talendforge.org/browse/TDS-1974.
Also you may need to restart Talend Data Stewardship so that new Kafka consumer groups are taken into account. See the corresponding Jira ticket https://jira.talendforge.org/browse/TDS-1975.