Important Configuration subjects - 6.4

Talend MDM Platform Installation Guide for Windows

EnrichVersion
6.4
EnrichProdName
Talend MDM Platform
task
Installation and Upgrade
EnrichPlatform
Talend Activity Monitoring Console
Talend Administration Center
Talend Artifact Repository
Talend CommandLine
Talend Data Preparation
Talend Data Stewardship
Talend DQ Portal
Talend ESB
Talend Identity Management
Talend Installer
Talend JobServer
Talend Log Server
Talend MDM Server
Talend MDM Web UI
Talend Project Audit
Talend Repository Manager
Talend Runtime
Talend SAP RFC Server
Talend Studio

The following sections describe various ways of configuring your MDM Server installation.

Clustering MDM Servers

Clustering is the process of grouping together a set of similar physical systems in order to ensure a level of operational continuity and minimize the risk of unplanned downtime, in particular by taking advantage of load balancing and failover features.

This section provides a high-level view of how to set up such a cluster of MDM Servers for Talend MDM and some information on how failover is handled in a cluster of MDM Servers.

Setting up a cluster of MDM Servers

Prerequisites:

  • Download and install the Apache httpd including mod_jk support and make sure it is running properly. For more information about how to install and run Apache httpd, refer to the official Apache documentation.

  • Download and install the Apache ActiveMQ standard distribution and run it. For more information about how to install and run ActiveMQ on your platform, refer to the Apache ActiveMQ documentation.

To set up a cluster of MDM servers, do the following.

  1. Install the first MDM Server as you would for an installation on a single machine.

  2. Duplicate this first instance on as many machines as you want to include in your cluster. In this case, duplicate means rerun the installation process with exactly the same parameters each time.

    Note that you can also work with multiple instances on the same physical server, using different port numbers, but in this case you do not have the same level of protection against the physical failure of a machine.

  3. Edit the file <$INSTALLDIR>/conf/mdm.conf on each MDM server instance as follows:

    • Add the line system.cluster=true under the System Settings section to enable the clustering configuration.

    • Change the value of mdm.routing.engine.broker.url to tcp://AMQHOST:AMQPORT, for example, tcp://localhost:61616. Here AMQHOST is the name of the server hosting ActiveMQ, and AMQPORT is the OpenWire TCP port that ActiveMQ listens to.

      Note

      By default, an MDM server uses an embedded Apache ActiveMQ broker as the JMS provider. In order to ensure correct communication between nodes, the JMS broker must be externalized and shared by every node of the cluster.

    • Add the following two lines to let MDM create authenticated connections to the ActiveMQ server.

      mdm.routing.engine.broker.userName=<USERNAME>
      mdm.routing.engine.broker.password=<PASSWORD>
  4. In the file <TomcatPath>/conf/server.xml, locate the <Engine> element and add an attribute for jvmRoute.

    <Engine name="Catalina" defaultHost="localhost" jvmRoute="mdm_node1">

    Here the value of jvmRoute represents the unique identifier of each MDM server node included in the cluster and must correspond to the worker name in the worker.properties file.

    For a specific example about how to set up a load balancer using Apache httpd with mod_jk support, see An example of how to set up a load balancing solution using Apache httpd and mod_jk.

    Repeat this step for each server instance.

  5. Restart all the MDM nodes in the cluster.

Full-text index replication is implemented within each MDM cluster. For more information, see Full-text index replication.

Once you have installed and configured all the required MDM Server instances, you need to explicitly connect them together in a cluster. Different solutions exist for doing so, both hardware and software.

The following example shows one way of doing this by setting up a load balancing solution using mod_jk with Tomcat. It assumes that you already have some experience of working with httpd and have some knowledge of Tomcat and Tomcat connectors (mod_jk).

Such a cluster consists of one Apache server that dispatches all the incoming requests across the cluster, and two "nodes", which are different instances of MDM Server installed on the same machine.

An example of how to set up a load balancing solution using Apache httpd and mod_jk

To declare a cluster of MDM Servers on the Apache server that manages the load balancing tasks, do the following under the directory <Apache_home>/conf/. Note that the location of Apache_home depends on the operating system you are using and how you installed Apache.

  1. Edit the configuration file httpd.conf and add the following lines:

    JkMount /talendmdm/* loadbalancer
    JkMountCopy all
  2. Create a new file worker.property and populate it as follows:

    Make sure the workers listed for worker.loadbalancer.balance_workers correspond to names specified for jvmRoute in the file <TomcatPath>/conf/server.xml because Apache server will dispatch the requests based on the file worker.property.

    # Define mdm_node1
    worker.mdm_node1.port=8109
    worker.mdm_node1.host=127.0.0.1
    worker.mdm_node1.lbfactor=1
    worker.mdm_node1.type=ajp13
    
    # Define mdm_node2
    worker.mdm_node2.port=8009
    worker.mdm_node2.host=127.0.0.1
    worker.mdm_node2.lbfactor=1
    worker.mdm_node2.type=ajp13
    
    # Declare the load balancer itself and all the worker nodes
    worker.loadbalancer.type=lb
    worker.loadbalancer.balance_workers=mdm_node1,mdm_node2
    worker.list=mdm_node1,mdm_node2,loadbalancer
    worker.loadbalancer.sticky_session=true

    Note

    You can find the AJP port of each MDM server node in the file <TomcatPath>/conf/server.xml. One example is shown below:

    <!-- Define an AJP 1.3 Connector on port 8109 -->
    <Connector port="8109" protocol="AJP/1.3" redirectPort="8543" />
  3. Restart the Apache server for the configuration to be taken into account.

Full-text index replication

MDM comes with a built-in full-text index replication based on JMS topics.

Each MDM server instance maintains its own full-text indexes. To maintain consistent indexes within the cluster, each change made on one node must be broadcasted to the other nodes so that each node applies modifications to its own indexes. This is called full-text index replication.

Suppose there are several MDM server nodes in a cluster. If a change affecting a full-text index needs to be performed on one node, the node will perform this change locally and then send a JMS message on a topic. When receiving the message, all the other nodes will perform the same change locally to ensure index consistency.

This feature is enabled as soon as an MDM data source has full-text capability enabled and system.cluster=true is added under the System Settings section in the file <$INSTALLDIR>/conf/mdm.conf.

When running with an Apache ActiveMQ as the JMS broker, the JMS topic used for full-text index replication is org.talend.mdm.server.index.replication.

Troubleshooting for full-text index replication

To ensure full-text index replication is enabled, verify that the following log message is output to mdm.log during the MDM server node startup:

INFO [JmsIndexReplicationManagerFactory] JmsIndexReplicationManagerFactory initialized

To debug the sending and receiving of JMS messages, edit the file <TomcatPath>/webapps/talendmdm/WEB-INF/conf/log4j.xml and add the following information:

<category name="com.amalto.core.storage.hibernate.search.jms">
   <priority value="DEBUG"/>
</category>

Additionally, make sure that the value of the "Threshold" parameter is at least DEBUG in the appenders.

Each time a replication message is received by a node, this log message is output to mdm.log:

DEBUG [JmsMessageListenerAdaptor] Received a message: [...]

This message may vary with our ActiveMQ version or system settings.

Each time a replication message is sent by a node, this log message is output to mdm.log:

DEBUG [JmsTopicLuceneWorkBroadcaster] JMS Message sent for index ...

This message may vary with our ActiveMQ version or system settings.

Handling failover

In a cluster of MDM Servers, each instance of MDM Server - that is to say, each node - is independent. As such, whenever a session is initiated on a particular node, it remains on that node. In other words, for that session, any HTTP requests coming from the same user are always sent to the same node.

The following table describes what happens when an individual node fails.

SourceOn failoverLimitations

Talend MDM Web User Interface

Users currently connected on live nodes see no difference.

New users can connect normally.

Users currently connected on the failed node are disconnected from their session and redirected to the login page, as happens when a session expires.

Running Jobs

Jobs connected on live nodes finish normally.

Jobs connected on the failed node will also fail, if they use the tMDMConnection component. However, Talend Administration Center can rerun the Jobs immediately and route them to another node.

For Jobs which do not use the tMDMConnection component, only one record is rejected.

Triggers

The Event Manager queues ensure that all asynchronous Triggers eventually run.

Synchronous Triggers running on the failed node also fail.

beforeSaving/beforeDeleting Processes

All Processes on live nodes run normally.

Processes on the failed node also fail, causing the create, update or delete action to be rejected.

Configuring an auto increment generator based on Hazelcast in a cluster

For a cluster of MDM servers, an auto increment generator based on Hazelcast, which is a distributed in-memory data grid, is initialized automatically during the MDM server startup.

This implementation can boost performance for auto increment generation in cluster mode. If needed, you can customize its relevant basic or advanced configuration.

Note that Hazelcast will never be initialized if you set system.cluster to false in the file <$INSTALLDIR>/conf/mdm.conf.

Basic configuration

Talend MDM allows you to change the basic Hazelcast configuration.

  1. Browse to the file <$INSTALLDIR>/conf/mdm.conf and open it.

  2. Edit the basic Hazelcast settings according to your needs.

    By default, the member discovery mechanism by TCP/IP is configured for Hazelcast. You do not have to list all of these cluster members' hostnames and/or IP addresses, but at least one of the listed members has to be active in the cluster when a new member joins.

    hz.group.name=dev
    hz.group.password=password
    hz.network.port=5705
    hz.network.port-auto-increment=true
    hz.multicast.enabled=false
    hz.tcp-ip.enabled=true
    #Write comma-separated IP addresses, i.e. 192.168.100.10, 192.168.100.11
    hz.members=127.0.0.1

    The properties are explained in the table below:

    Property

    Description

    hz.group.name

    Defines the name of a cluster group.

    hz.group.password

    Defines the password of a cluster group.

    The password will be encrypted during the MDM server startup.

    hz.network.port

    Specifies the port that Hazelcast will use to communicate between cluster members.

    hz.network.port-auto-increment

    Indicates whether to enable the auto increment feature of the port specified by hz.network.port.

    According to the above example, Hazelcast will try to find free ports between 5705 and 5805.

    hz.multicast.enabled

    Indicates whether the multicast discovery is enabled or not. Its value can be true or false.

    hz.tcp-ip.enabled

    Indicates whether the TCP/IP discovery is enabled or not. Its value can be true or false.

    hz.members

    Lists IP address(es) of one or more well known members. Once members are connected to these well known ones, all member addresses will be communicated with each other.

  3. Save your changes to the file.

Advanced configuration

If the basic Hazelcast configuration in <$INSTALLDIR>/conf/mdm.conf is not enough for your practical usage, you can make the advanced configuration:

  1. Browse to the file <TomcatPath>/webapps/talendmdm/WEB-INF/beans.xml and open it.

  2. Locate the Hazelcast configuration part.

     <hz:config id="hzConfig">
            <hz:network port="${hz.network.port}" port-auto-increment="${hz.network.port-auto-increment}">
                <hz:join>
                    <hz:multicast enabled="${hz.multicast.enabled}"/>
                    <hz:tcp-ip enabled="${hz.tcp-ip.enabled}">
                        <hz:members>${hz.members}</hz:members>
                    </hz:tcp-ip>
                </hz:join>
            </hz:network>
        </hz:config>

    Note that the values of the parameters provided in beans.xml will be fetched from their counterparts defined in the mdm.conf file.

  3. Update the file beans.xml with additional Hazelcast settings according to your needs. For more information, refer to http://docs.hazelcast.org/docs/3.7/manual/html-single/index.html#discovering-cluster-members.

  4. Save your changes to the file beans.xml.

Configuring SSL support for the MDM server

You can configure the MDM Server to run securely on an HTTP server using the Secure Sockets Layer (SSL) protocol.

Configuring the MDM server to use SSL

To ensure a secure communication environment, you can configure Secure Sockets Layer (SSL) support on Tomcat.

Note

You are recommended to configure Tomcat with SSL support only when running Tomcat as standalone web server. It is not necessary to configure SSL support when Tomcat runs behind another web server such as Apache.

Prerequisites:

  • JRE 1.8.0 or higher must be installed. Make sure that the JAVA_HOME environment variable is set to point to the JRE directory. For example, if the path is C:\Java\JREx.x.x\bin, you must set the JAVA_HOME environment variable to point to: C:\Java\JREx.x.x.

  • You have a keystore file containing a self signed certificate for SSL. For more information about how to generate a keystore file, see How to generate a keystore file.

  1. Browse to the directory <TomcatPath>/conf, and then open the file server.xml.

  2. Uncomment the following text.

     <!--
        <Connector port="8543" protocol="org.apache.coyote.http11.Http11NioProtocol"
                   maxThreads="150" SSLEnabled="true" scheme="https" secure="true"
                   clientAuth="false" sslProtocol="TLS" />
        -->
  3. Add the information of the complete path to the keystore file and the password for the keystore file.

    <Connector port="8543" protocol="org.apache.coyote.http11.Http11NioProtocol"
                   maxThreads="150" SSLEnabled="true" scheme="https" secure="true"
                   keystoreFile="${user.home}/.keystore" keystorePass="changeit"
                   clientAuth="false" sslProtocol="TLS" />
    

    Warning

    Make sure that the keystoreFile contains the path and file name of the keystore, and the keystorePass matches the password for the keystore.

  4. Save your changes into the file.

  5. Restart Tomcat to take into account your updates.

How to generate a keystore file

The following gives an example of how to generate a self signed certificate using Java Keytool.

  1. Open a command prompt window.

  2. Run the following command to generate a new file named ".keystore" in your home directory.

    "%JAVA_HOME%\bin\keytool" -genkey -alias tomcat -keyalg RSA

    Note

    If you want to specify a different location or file name, add the -keystore parameter to the command. For example, you can add -keystore talendmdm.keystore to the command to generate a keystore file named talendmdm.keystore.

  3. Enter the keystore password as prompted, and then enter the password again to confirm it. By default, it is "changeit".

  4. Enter the general information about this Certificate, such as the organization name or the city. Make sure that the information you entered matches the information expected by users who attempt to access a secure page in your application.

  5. Enter the key password as prompted, which is the password specifically for this Certificate.

    Warning

    You are recommended to use the same password for the keystore file and the key.

  6. Go to your home directory and verify that a .keystore file is newly generated.

Configuring the MDM server to respond to HTTPS requests only

With the SSL support configuration, an MDM server can respond to both HTTP and HTTPS requests.

To ensure the security of communication with the MDM server, you can configure the MDM server to respond to HTTPS requests only by modifying the file web.xml under the directory <TomcatPath>/webapps/talendmdm/WEB-INF.

Open the file web.xml, and then uncomment the following text:

    <!-- Uncomment the following to configure webapp to always require HTTPS -->
    <security-constraint>
        <web-resource-collection>
            <web-resource-name>HTTPSOnly</web-resource-name>
            <url-pattern>/*</url-pattern>
        </web-resource-collection>
        <user-data-constraint>
            <transport-guarantee>CONFIDENTIAL</transport-guarantee>
        </user-data-constraint>
    </security-constraint>

After such configuration, if you enter a URL starting with http in your browser, you will be redirected to the secure URL starting with https automatically.

Configuring a client to communicate with the MDM server using SSL

If the MDM server is configured with SSL support, any Java client must provide a TrustStore to verify the server's certificate in order to communicate with the MDM server using SSL.

For more information about how to configure the studio to communicate with the MDM server using SSL, see the section on SSL preference settings of your Talend Studio User Guide.

The following example shows how to configure SSL support for the MDM server from the Bonita BPM server which is installed standalone.

  1. Browse to the directory <Bonita_Home>/bin. Here Bonita_Home indicates the directory where the Bonita server has been installed manually.

    For more information about how to install the Bonita server manually, see Installing the Bonita BPM server manually.

  2. Edit the file setenv.bat by adding the following line:

    -Djavax.net.ssl.trustStore=<full path to the keystore file> -Djavax.net.ssl.trustStorePassword=<password of the keystore file>
  3. Save your changes into the file.

Configuring session timeout for the Web User Interface

A user session timeout for Talend MDM Web User Interface is set to 30 minutes by default. The business user or data steward will be redirected to the login page of the Web User Interface after a period of 30 minutes of non-activity.

You can always change this session timeout, if required.

To set up a new timeout for users connecting to the Web User Interface, complete the following:

  1. In the Tomcat folder, browse to the file \webapps\talendmdm\WEB-INF\web.xml.

  2. Open the web.xml file in a text editor and search for the following tag:

    <session-config>
        <session-timeout>30</session-timeout>
        <tracking-mode>COOKIE</tracking-mode>
    </session-config>
  3. Change the value of the default session timeout as desired.

  4. Save your modifications.

The new session timeout parameter has been set for users connecting to the Web User Interface.

Changing the default ports used by the MDM server in Tomcat

During the MDM server installation, under the directory <TomcatPath>\conf\, the file server.xml is configured to use one set of ports for the MDM server. One example is shown below:

<Server port="8105" shutdown="SHUTDOWN">
    ...
    <Service name="Catalina">
        <Connector port="8180" protocol="HTTP/1.1"
                   connectionTimeout="20000"
                   redirectPort="8543" />
        <Connector port="8109" protocol="AJP/1.3" redirectPort="8543" />
    ...
    </Service>
</Server>

If needed, you can customize the set of ports to be used. Three sets of predefined ports are available, each of which provides a set of different ports, as shown in the table below:

Port Set

Port Numbers

Default

mdm.port.http=8180

mdm.port.https=8543

mdm.port.ajp=8109

mdm.port.shutdown=8105

set 1

mdm.port.http=8080

mdm.port.https=8443

mdm.port.ajp=8009

mdm.port.shutdown=8005

set 2

mdm.port.http=8280

mdm.port.https=8643

mdm.port.ajp=8209

mdm.port.shutdown=8205

set 3

mdm.port.http=8380

mdm.port.https=8743

mdm.port.ajp=8309

mdm.port.shutdown=8305

For more information on configuring the MDM server, see Configuring MDM Server.

Encrypting the passwords using the CommandLine

The file encrypt.bat (Windows) or encrypt.sh (Linux) under the directory <MDM_ROOT>/tools/encrypt enables you to encrypt plain text passwords from the CommandLine.

After that, the encrypted passwords can be used in configuration files directly. For more information, see Managing the passwords in configuration files.

The following example shows how to encrypt a password using the file encrypt.bat:

  1. Open the CommandLine.

    For more information, see Cheatsheet: start and stop commands for Talend server modules.

  2. Enter the full directory that navigates to the file encrypt.bat.

  3. Run the command encrypt <your_password> to encrypt your password.

    The encrypted password is displayed in the CommandLine accordingly.

Configuring the MDM server for bulk load operations

When bulk loading large volumes of data into MDM, you may encounter the transaction and deadlock issues since too many threads on the server side are trying to obtain database connections.

You can configure the MDM server to avoid such issues:

  1. Browse to the file <$INSTALLDIR>/conf/mdm.conf and open it.

  2. Configure the following two parameters according to your needs:

    #To avoid connection pool and database overload, default value is 25
    bulkload.concurrent.database.requests=30
    
    #Control how many milliseconds to wait before retry, default value is 200
    bulkload.concurrent.wait.milliseconds=300

    In this example, we set the maximum number of concurrent database requests to 30 and the request retry wait time to 300 milliseconds.

  3. Save your changes into the file.

Enabling and configuring the integration of Talend Data Stewardship with MDM

Talend Data Stewardship can be integrated with MDM to replace Talend Data Stewardship Console for performing integrated matching tasks. Moreover, Talend MDM allows you to switch between Talend Data Stewardship Console and Talend Data Stewardship.

To enable the integration of Talend Data Stewardship with MDM, you need to make the following configurations for the MDM server:

  1. Browse to the file <$INSTALLDIR>/conf/mdm.conf and open it.

  2. Uncomment the properties related to Talend Data Stewardship and configure them according to your needs. For example,

    # TDS settings
    ######################################################
    tds.root.url=http://localhost:19999
    tds.user=owner1@company.com
    tds.password=owner1
    tds.core.url=/data-stewardship
    tds.schema.url=/schemaservice
    tds.api.version=/api/v1
    tds.batchsize=50

    The values of the three properties tds.core.url, tds.schema.url and tds.api.version must remain unchanged, and the other properties are explained below:

    Property

    Description

    tds.root.url

    Indicates the URL used to access Talend Data Stewardship, including the port.

    tds.user

    Indicates the username for accessing Talend Data Stewardship.

    Note that Talend Data Stewardship uses Talend Administration Center as the authentication provider, and TAC usernames are always in the form of an email address.

    The value of this property must be a valid username of a Talend Administration Center user who also serves as a Data Stewardship User and has the Campaign Owner data stewardship role.

    For more information, see the Talend Data Stewardship Documentation.

    tds.password

    Indicates the password for accessing Talend Data Stewardship.

    Its value must be a valid password corresponding to the Talend Administration Center user mentioned above.

    tds.batchsize

    Indicates the task creation or query batch size.

  3. Save your changes into the file.