Talend Data Catalog in active-passive cluster mode - 8.0

Talend Data Catalog Installation and Upgrade Guide

Operating system
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
Talend Data Catalog
Installation and Upgrade
Last publication date

With a Talend Data Catalog Advanced or Advanced Plus license edition, you can install a two-server, active-passive configuration relying on a distributed database to benefit from a high availability with your product.

Clustering is the process of grouping together a set of similar physical systems in order to ensure a level of operational continuity and minimize the risk of unplanned downtime, in particular by taking advantage of failover features.

Failover allows you to automatically switch to a secondary server if the primary server is down or temporarily unreachable.

Note: You cannot install a cluster configuration on the server itself.

Architecture of Talend Data Catalog in active-passive cluster mode

The following diagram illustrates the architecture behind Talend Data Catalog when set up in cluster mode.

This architecture is composed of several functional blocks:

  • Two Talend Data Catalog application servers are installed on different machines. Each server instance hosts an identical Apache Tomcat server installation and resides on a shared file server. Only one server is running at a time, known as the active server. The other server is passive and does not access the shared file server.

    You can get a license that works for both servers by providing two HostInfo.xml files, one for each server, in your license request.

  • All instances of the application server are connected to the distributed database.

    For more information, refer to your corresponding database vendor documentation.

  • A third-party high availability software is installed on each instance. The high availability management software detects when the primary server is down and starts the secondary server. Before starting it, the high availability system must unlock all the files in the data directory.

    This feature is not provided by Talend and needs to be implemented separately.

  • A shared file server is implemented to store and share all application data, including the data directory, and log files between the instances. You can define the data directory with the M_DATA_DIRECTORY parameter in the <TDC_HOME>/conf/conf.properties file or with the Data Directory field from the Setup utility.

    As the Talend Data Catalog server locks files in the data directory when it accesses them and unlocks them when it is done. If the primary server still locks some files when it is down, the secondary server will fail to start as it must access these files. You can implement a script to unlock the files in the data directory before starting the secondary server.

    This feature is not provided by Talend and needs to be implemented separately.