Building high availability for Talend Data Integration with Microsoft Azure

author
Alban Leduc
EnrichVersion
6.5
EnrichProdName
Talend Real-Time Big Data Platform
Talend Data Integration
Talend Data Fabric
Talend Big Data
Talend Big Data Platform
Talend Data Services Platform
Talend Data Management Platform
Talend MDM Platform
task
Design and Development
EnrichPlatform
Talend Administration Center
Talend JobServer

Building high availability for Talend Data Integration with Microsoft Azure

This article shows how to provide high availability for Talend Data Integration with Azure Production Environment.

Overview

This article explains how to use:

  • Azure Virtual Machine to create four virtual machines.
  • Azure Storage to share files between the two Talend Administration Center instances.
  • Azure Database for MySQL for the Talend Administration Center database.
  • Azure Load Balancer to ensure the failover.

As a reminder, the high availability of Talend Administration Center is only based on the scheduling of task executions.

You will build an architecture with:

  • Two Talend Administration Center instances
  • Two Talend JobServer instances
  • One Azure Storage
  • One Azure MySQL database
  • One Azure Load Balancer

Architecture

Prerequisites

Microsoft Azure

  • You have access to Microsoft Azure with a valid account (https://portal.azure.com).
  • You have full access to the Azure services described in the overview section above.

Talend

  • You are familiar with the installation and the management of Talend Data Integration.

Environment

In this example, you use a Redhat 7.3 Linux environment to implement the architecture. Redhat 7.3 Linux is a standard image available on Microsoft Azure.

But you can use another Linux system or Windows server system to implement it.

Creating virtual machines with Azure for high availability of Talend Administration Center

This section explains how to create Azure Virtual Machines with a Linux Redhat 7.3.

Procedure

  1. Go to the Azure Portal: https://portal.azure.com/
  2. Click Add on the Virtual Machines tab.

    For more information on how to create an Azure virtual machine, see Create a Windows virtual machine with the Azure portal.

  3. Choose Redhat in the list proposed by Azure, and select version 7.3.

    You can choose another operating system according to your needs

  4. Configure the basic settings as follows.
    Field Value
    Name

    Define the name of your virtual machine, aleducTac1, for example.

    VM disk type Define the type of disk: SSD or HDD.
    User name Define the user name you want to use to login to the virtual machine.
    Authentication type Define the type of authentication: SSH public or Password.
    Resource group

    Select your resource group.

    If you do not have an existing resource group, create one. This allows you to manage your Azure environment easily.

    Location

    Select the location.

    Adapt the location to your area.

  5. Select the size of your virtual machine, in this case, choose a small size.

    This section allows you to choose the number of CPU and the size of the memory. The size depends on your environment. For more information, see Sizes for Windows virtual machines in Azure.

  6. Configure the optional features as follows.
    Field Value
    High Availability

    Choose or create an Availability Set.

    By creating an availability set, Azure ensures that the virtual machines run across multiple Azure infrastructures (physical servers, storage units, etc.).

    Virtual network

    Keep the default value.

    The virtual network groups the virtual machines that you connect together.

    Public IP address

    Define the Public IP address. In this case, create a static IP.

    You need a static IP because Talend Administration Center connects to the Azure MySQL database where you set the IP.

    Network security group

    Define the Network security group. All the virtual machines needed for your architecture use the network security group (NSG).

    The network security group contains all the security of your architecture (inbound security rules, outbound security rules). For more information, see Filter network traffic with network security groups.

    Diagnostics storage account Select the Azure storage where diagnostics data will be stored.
  7. Click Create to create and deploy your virtual machine once Azure validates the configuration.

Results

The virtual machine is now running.

Configuring Azure for Talend Administration Center

This section explains how to configure Azure for Talend Administration Center.

Procedure

  1. Configure your Azure Network Security Group (NSG). Set a new rule to allow access to the Talend Administration Center Tomcat on the port 8080.

    The SSH rule is automatically set during the deployment of the virtual machine, but it is not sufficient.

  2. Configure the Firewall on the Redhat system. Add a new entry with the port 8080 to access the Talend Administration Center Tomcat from your browser. Then, execute the command below on the Redhat system.
  3. Install Talend Administration Center on the virtual machine.
  4. Configure Talend Administration Center into Cluster Mode. Modify the file /<TAC_HOME>/ apache-tomcat/webapps/org.talend.administrator/WEB-INF/classes/quartz.properties and uncomment the lines in red.
  5. Create a second virtual machine with the same settings and a different name, aleducTac2 for example.

Configuring Azure for Talend JobServer

This section explains how to create virtual machines and configure Azure for Talend JobServer.

Procedure

  1. Create a virtual machine for Talend JobServer by repeating the procedure used to create the Talend Administration Center virtual machines.

    Name of your virtual machine aleducJobServer1, for example.

  2. Configure your Azure Network Security Group. Set new rules to allow Talend Administration Center to access the Talend JobServer virtual machine by adding these three ports: 8000, 8001 and 8888.
  3. Configure the Firewall on the Redhat system on the virtual machine. Execute the command below on the Redhat system:
  4. Install Talend JobServer on the virtual machine
  5. Create a second virtual machine with the same settings and a different name, aleducJobServer2 for example.

Configuring Azure Storage for high availability storage

This section explains how to configure Azure Storage to collect diagnostics data and share files between your Talend Administration Center instances.

Azure Storage is a cloud service that provides high availability storage, it is secure, scalable and redundant.

For more information, see Introduction to Microsoft Azure Storage.

Procedure

  1. Configure your storage account as follows.
    Field Value
    Name Define the name of the storage.

    For example, aleducstorage.

    Resource group Choose the same resource group as in the Talend Administration Center virtual machines.
    Location Choose the same location as in the Talend Administration Center virtual machines.
  2. Create a folder named tac to share the Talend Administration Center log files.

Configuring Azure Storage on Talend Administration Center virtual machines

This section explains how to install Samba on your virtual machines to share Azure Storage files.

Azure Storage is a network file share in the cloud using the standards Server Message Block (SMB) Protocol.

For more information, see Introduction to Azure Files.

Procedure

  1. Install the Samba Client package using the Linux command below.
  2. Create a directory to use as a share folder.
  3. Mount the Azure Storage file share on the folder created.
    sudo mount -t cifs //<storage-account-name>.file.core.windows.net/<share-name>./<mymountpoint> -o vers=3.0,username=<storage-account-name>,password=<storage-account-key>,dir_mode=0777,file_mode=0777
    <storage-account-name> Name of your Azure Storage.
    <share-name> Name of the folder created on Azure Storage.
    <mymountpoint> Name of the directory created above the folder, where Azure Storage file share will be mounted.
    <storage-account-key> Key given by Azure Storage on the section Access Keys.

Configuring Azure Database for MySQL

This section explains how to deploy an Azure Database for MySQL to allowTalend Administration Center to store data.

Azure Database for MySQL is a relational database service based on MySQL Community Edition database engine. It is scalable according to your needs. For more information, see What is Azure Database for MySQL?

Procedure

  1. Configure your Azure Database for MySQL as follows.
    Field Value
    Name Name your database.

    For example, aleduc-mysql.

    Resource group Choose the same resource group as in the Talend Administration Center virtual machines.
    Server admin login name Define the server admin login name you want to use to connect to the MySQL database.
    Location Choose the same location as in the Talend Administration Center virtual machines.
    Version Choose the 5.7 version, it is the version recommended with Talend 6.4.
    Pricing tiers Choose the size according to your needs.
  2. Configure the Azure Database for MySQL. Add the IP addresses of your Talend Administration Center instances.
    Note: It is recommended to choose static IP addresses to make sure that the virtual machines keep the same IP addresses after a reboot.

Configuring Azure Load Balancer for Talend Administration Center

This section explains how to configure Azure Load Balancer to switch to the second Talend Administration Center if the first one is unavailable.

For more information, see Azure Load Balancer overview.

Procedure

  1. Create your Azure Load Balancer as follows.
    Field Value
    Name Name your load balancer.

    For example, aleducLoadBalancer.

    Type Select the type of Azure Load Balancer.
    In this case, choose Public to have a public address.
    Note: Choose a static IP address for your Azure Load Balancer to make sure that it keeps the same IP addresses after a reboot.
    Resource group Choose the same resource group as in the Talend Administration Center virtual machines.
    Location Choose the same location as in the Talend Administration Center virtual machines.
  2. Define the Name of your Azure Load Balancer Backend Pool, aleducLoadBalancerBackendPool for example.

    The Backend Pool contains the virtual machines that the Azure Load Balancer can use according to the rules defined.

  3. Add the two Talend Administration Center virtual machines.
  4. Configure the Azure Load Balancer Health Probe as follows.
    Field Value
    Name Define the name of your Azure Balancer Health Probe.

    For example, aleducLoadBalancerHealthProbe.

    The probe tests the availability of the Talend Administration Center instances.

    Protocol Choose HTTP to check the Tomcat availability.
    Port Define the port 8080, it is the Tomcat port in this case.
    Path Define the path to build the URL.
    Interval Define the interval (in seconds) at which the probe validates the availability of an instance.
    Unhealthy threshold Define the number of consecutive failures allowed before switching to another instance.
  5. Configure the Azure Load Balancing Rule as follows.
    Field Value
    Name Define the name of your Azure Load Balancing Rule.

    For example, aleducLoadBalancerRule.

    IP Version Choose IPv4.
    Frontend IP Address Choose the Frontend IP address corresponding to the static IP address defined during the creation of the Load Balancer.
    Protocol Choose TCP.
    Port Define the port 8080, it is the Tomcat port in this case.
    Backend Port Define the Backend port 8080.
    Backend pool Choose the Backend pool you defined before.
    Health probe Choose the Health probe you defined before.
    Session persistence Choose Client IP.
  6. Click OK.