Clustering is the process of grouping together a set of similar physical systems in order to ensure a level of operational continuity and minimize the risk of unplanned downtime, in particular by taking advantage of load balancing and failover features.
This section provides a high-level view of how to set up such a cluster of MDM Servers for Talend MDM and some information on how failover is handled in a cluster of MDM Servers.
Download and install the Apache httpd including mod_jk support and make sure it is running properly. For more information about how to install and run Apache httpd, refer to the official Apache documentation.
Download and install the Apache ActiveMQ standard distribution and run it. For more information about how to install and run ActiveMQ on your platform, refer to the Apache ActiveMQ documentation.
To set up a cluster of MDM servers, do the following.
Install the first MDM Server as you would for an installation on a single machine.
Duplicate this first instance on as many machines as you want to include in your cluster. In this case, duplicate means rerun the installation process with exactly the same parameters each time.
Note that you can also work with multiple instances on the same physical server, using different port numbers, but in this case you do not have the same level of protection against the physical failure of a machine.
Edit the file <$INSTALLDIR>/conf/mdm.conf on each MDM server instance as follows:
Add the line system.cluster=true under the System Settings section to enable the clustering configuration.
Change the value of mdm.routing.engine.broker.url to tcp://AMQHOST:AMQPORT, for example, tcp://localhost:61616. Here AMQHOST is the name of the server hosting ActiveMQ, and AMQPORT is the OpenWire TCP port that ActiveMQ listens to.
By default, an MDM server uses an embedded Apache ActiveMQ broker as the JMS provider. In order to ensure correct communication between nodes, the JMS broker must be externalized and shared by every node of the cluster.
Add the following two lines to let MDM create authenticated connections to the ActiveMQ server.
In the file <TomcatPath>/conf/server.xml, locate the <Engine> element and add an attribute for jvmRoute.
<Engine name="Catalina" defaultHost="localhost" jvmRoute="mdm_node1">
Here the value of
jvmRouterepresents the unique identifier of each MDM server node included in the cluster and must correspond to the worker name in the worker.properties file.
For a specific example about how to set up a load balancer using Apache httpd with mod_jk support, see An example of how to set up a load balancing solution using Apache httpd and mod_jk.
Repeat this step for each server instance.
Restart all the MDM nodes in the cluster.
Full-text index replication is implemented within each MDM cluster. For more information, see Full-text index replication.
Once you have installed and configured all the required MDM Server instances, you need to explicitly connect them together in a cluster. Different solutions exist for doing so, both hardware and software.
The following example shows one way of doing this by setting up a load balancing solution using mod_jk with Tomcat. It assumes that you already have some experience of working with httpd and have some knowledge of Tomcat and Tomcat connectors (mod_jk).
Such a cluster consists of one Apache server that dispatches all the incoming requests across the cluster, and two "nodes", which are different instances of MDM Server installed on the same machine.
An example of how to set up a load balancing solution using Apache httpd and mod_jk
To declare a cluster of MDM Servers on the Apache server that manages the load balancing tasks, do the following under the directory <Apache_home>/conf/. Note that the location of Apache_home depends on the operating system you are using and how you installed Apache.
Edit the configuration file httpd.conf and add the following lines:
JkMount /talendmdm/* loadbalancer JkMountCopy all
Create a new file worker.property and populate it as follows:
Make sure the workers listed for
worker.loadbalancer.balance_workerscorrespond to names specified for
jvmRoutein the file <TomcatPath>/conf/server.xml because Apache server will dispatch the requests based on the file worker.property.
# Define mdm_node1 worker.mdm_node1.port=8109 worker.mdm_node1.host=127.0.0.1 worker.mdm_node1.lbfactor=1 worker.mdm_node1.type=ajp13 # Define mdm_node2 worker.mdm_node2.port=8009 worker.mdm_node2.host=127.0.0.1 worker.mdm_node2.lbfactor=1 worker.mdm_node2.type=ajp13 # Declare the load balancer itself and all the worker nodes worker.loadbalancer.type=lb worker.loadbalancer.balance_workers=mdm_node1,mdm_node2 worker.list=mdm_node1,mdm_node2,loadbalancer worker.loadbalancer.sticky_session=true
You can find the AJP port of each MDM server node in the file <TomcatPath>/conf/server.xml. One example is shown below:
<!-- Define an AJP 1.3 Connector on port 8109 --> <Connector port="8109" protocol="AJP/1.3" redirectPort="8543" />
Restart the Apache server for the configuration to be taken into account.
MDM comes with a built-in full-text index replication based on JMS topics.
Each MDM server instance maintains its own full-text indexes. To maintain consistent indexes within the cluster, each change made on one node must be broadcasted to the other nodes so that each node applies modifications to its own indexes. This is called full-text index replication.
Suppose there are several MDM server nodes in a cluster. If a change affecting a full-text index needs to be performed on one node, the node will perform this change locally and then send a JMS message on a topic. When receiving the message, all the other nodes will perform the same change locally to ensure index consistency.
This feature is enabled as soon as an MDM data source has full-text capability
system.cluster=true is added under the
Settings section in the file <$INSTALLDIR>/conf/mdm.conf.
When running with an Apache ActiveMQ as the JMS broker, the JMS topic used for
full-text index replication is
To ensure full-text index replication is enabled, verify that the following log message is output to mdm.log during the MDM server node startup:
INFO [JmsIndexReplicationManagerFactory] JmsIndexReplicationManagerFactory initialized
To debug the sending and receiving of JMS messages, edit the file <TomcatPath>/webapps/talendmdm/WEB-INF/conf/log4j.xml and add the following information:
<category name="com.amalto.core.storage.hibernate.search.jms"> <priority value="DEBUG"/> </category>
Additionally, make sure that the value of the
parameter is at least
DEBUG in the appenders.
Each time a replication message is received by a node, this log message is output to mdm.log:
DEBUG [JmsMessageListenerAdaptor] Received a message: [...]
This message may vary with our ActiveMQ version or system settings.
Each time a replication message is sent by a node, this log message is output to mdm.log:
DEBUG [JmsTopicLuceneWorkBroadcaster] JMS Message sent for index ...
This message may vary with our ActiveMQ version or system settings.
In a cluster of MDM Servers, each instance of MDM Server - that is to say, each node - is independent. As such, whenever a session is initiated on a particular node, it remains on that node. In other words, for that session, any HTTP requests coming from the same user are always sent to the same node.
The following table describes what happens when an individual node fails.
Talend MDM Web User Interface
Users currently connected on live nodes see no difference.
New users can connect normally.
Users currently connected on the failed node are disconnected from their session and redirected to the login page, as happens when a session expires.
Jobs connected on live nodes finish normally.
Jobs connected on the failed node will also fail, if they use the tMDMConnection component. However, Talend Administration Center can rerun the Jobs immediately and route them to another node.
For Jobs which do not use the tMDMConnection component, only one record is rejected.
The Event Manager queues ensure that all asynchronous Triggers eventually run.
Synchronous Triggers running on the failed node also fail.
All Processes on live nodes run normally.
Processes on the failed node also fail, causing the create, update or delete action to be rejected.