Talend ESB runtime auto-scaling on AWS platform

author
Xin Liu
EnrichVersion
6.4
6.3
task
Deployment
Administration and Monitoring
EnrichPlatform
Talend ESB

Talend ESB auto-scaling on AWS platform overview

Online shopping companies always need to design a stable and scalable environment for their web services applications – specially need to be 100% available during sales event (Christmas sales, Black-Friday sales, etc). It’s becoming very challenging for them if they need to have heavy capacity preparation activities before each event, such as bringing extra server infrastructures, web service code deployment activities, manual configurations...

In this article, you will see how one of online shopping companies using a modern auto-scaling design solution with Talend and AWS platform to fit with their stable demand patterns or usage variabilities.

AWS Auto Scaling helps to maintain application availability and allows to scale your Amazon EC2 capacity up or down automatically according to conditions defined. You can use Auto Scaling to help ensure that you are running your desired number of Amazon EC2 instances. Auto Scaling can also automatically increase the number of Amazon EC2 instances during demand spikes to maintain performance and decrease capacity during lulls to reduce costs.

Talend ESB Runtime provides an Apache Karaf-based ESB container pre-configured to support Talend Mediation route (Apache Camel routing) and Talend Web Services Jobs (Apache CXF-based services - both REST and SOAP-based).

In this architecture, you will build a Talend Data Services platform hosted on Amazon EC2 instances, and setup AWS Auto Scaling capacities to increase/decrease Talend ESB Runtime servers based on demand.

When you will have completed this the entire Talend Runtime Auto-Scaling demo, you will be able to build a 100% scalable environment for your Web Service applications by using Talend and AWS technologies.

You may implement the similar solution based on your need, and bring more AWS services to your designs such as: OpsWorks, CloudFormation, Elastic Beanstalk, Lambda function, etc. Also, the AWS Auto-Scaling feature is not limited to server Talend Runtime applications, other Talend server components could equally benefit from this (such as Talend Jobserver for ETL Job processing), so you will share more use cases with technologies mentioned here. Please find more Talend + AWS use cases on https://www.talend.com.

Talend ESB runtime auto-scaling architecture on AWS platform

Talend ESB runtime auto-scaling architecture on AWS platform

Main Talend Runtime Auto-Scaling process:

Procedure

  • Talend TAC has one or multiple pre-configured Runtime servers (from same Auto Scaling group) and web services applications (Talend Jobs or Routes) have been deployed from Nexus to each Runtime container. Existing Runtime(s) have AWS Elastic Load Balancer in front to distribute incoming traffic.
  • When one or more runtime(s) EC2 instances have passed resource capacity limitation pre-defined in AWS CloudWatch Events (E.g one of runtime EC2 instance’s CPU is running above 80% during the last 30 minutes - with increased Web Service requests during sales event).
  • AWS CloudWatch notifies Talend Auto Scaling Group (AWS Launch Configuration) to spin up one or more runtime server EC2 instances and use the same Load Balancer in front, so that increased incoming traffic could be redirected to new Runtime instances.
  • Once new Runtime Instance(s) are initialised by Auto Scaling group, it will use Talend MetaServlet as Linux service scripts to:
    1. Register itself in Talend TAC as a new Runtime server.
    2. Create a new task in TAC/ESB Conductor and deploy the same service from Nexus repository to new Runtime container(s).
      Note: No human intervention is required during the Talend Runtime Auto-Scaling process.
  • Once incoming traffic is back to “normal” – E.g. All runtime server EC2 instance’s CPU are running under 20% for last hour - AWS CloudWatch Events will notify Auto Scaling group to stop/determine Runtime server instances and some Talend MetaServlet scripts can be used here:
    1. Undeploy the service from Runtime server.
    2. Remove the service from TAC/ESB conductor.
    3. Deregister Talend Runtime in TAC/Servers configuration page before shutdown.
      Note: For client requests traffic, all above steps will be keeping transparent during the entire process, since the AWS Load Balancer will be providing a unique entry service endpoint.

Talend ESB runtime auto-scaling assumptions

Amazon Web Services (AWS):

  • You should be familiar with the AWS platform so this article does not take a deep dive into details regarding Administration and Management of AWS services. You can refer to the Amazon Web Services (AWS) - Getting Started to read on all the AWS functionalities that Talend provides.
  • You should also have full access to the main AWS services described in the Prerequisites section below.

Talend:

  • You should be familiar with the Environment and Prerequisites of the Installation and Management of Talend Data Services Platform.
  • You should have basic knowledge for Talend MetaServlet APIs. For further information, please refer to: Talend Administration Center MetaServlet API commands.

Talend ESB runtime auto-scaling platform prerequisites

A valid AWS account with full access to following services:

Amazon Elastic Compute Cloud

Amazon Cloudwatch

Amazon Auto Scaling

Read the documentation at Valid AWS Access Keys to programmatically access AWS services to know how to create/manage/use AWS access keys.

Enabling TAC and Nexus on EC2 and configuration

Launching an EC2 instance

In this article, the Talend Administration Center (TAC) and Nexus repository have been installed on the same EC2 instance.

Procedure

  1. Connect to EC2 console in any region and click on Launch Instance.
  2. Choose AMI: Microsoft Windows Server 2012 Base for this demo. Any Talend supported OS type can be used in this demo.

    For further information, see Compatible Operating Systems.

  3. Select m4.large (or m4.xlarge if a more performing machine is needed) and configure Instance details:
    1. Network: Choose your VPC (use default if you don’t have specific VPC configured)
    2. Subnet : No preference
    3. Auto-assign Public IP : Disable (you will configure Elastic IPs later to use static IP address)
    4. Other options as default
  4. Add Storage : use Size 50 GiB and General Purpose SSD Volume Type (give you enough disk space for OS and Talend installation).
  5. Add Tag : Key:NameValue:TalendRuntimeAutoScaling-TAC&Nexus
  6. Configure Security Group: Create a new Security Group as below (only for this demo, you should have more strict rules in real case).
    1. Inbound:
    2. Outbound:
  7. Launch the EC2 instance once all configured. Once the instance created in EC2 dashboard, select the Elastic IPs option on AWS EC2 interface to configure static IP address for TAC EC2 instance.
  8. Select Allocate new address and then choose the new created TAC EC2 Instance Id. Once configured, you will find similar allocation information as shown below:

Installing Talend Administrator Centre

Procedure

  1. Install Talend Administrator Center, Nexus will also be installed under TAC path <tac>/Artifact-Repository-Nexus-V2.11.3-01/..

    For further information on TAC installation, see Installing your Talend product using Talend Installer (recommended).

  2. Once TAC has been installed on the EC2 instance, verify that you can access TAC interface by using supported web browser http(s)://{TACHost}:8080/org.talend.administrator.
  3. You will keep using the default login/password, admin@company.com/admin.
  4. Also verify Nexus interface by using a web browser.http(s)://{TACHost}:8081/org.talend.administrator.
    Note: If you can’t see the interfaces above, check also if the Windows server firewall has been opened on 8080-8081 for inbound requests as shown below.

Preparing a Talend REST service Job

You would also need to prepare a Talend Web Service Job/Route (REST/SOAP) that can be deployed into Talend Runtime instances later on in this demo, you can download the Demo Job directly from here.

Procedure

  1. Read current EC2 Runtime instance hostname by requesting AWS EC2 Metadata.
  2. Receive REST Request call with the following URL: http(s)://{RuntimeHost}:8040/services/test/{YourInputString}.
  3. Response with XML message as shown below: {YourInputString} from server: {AWS EC2 Runtime instance hostname}

    Retrieve the Job export AWSTestWSJobExport.zip file from the Downloads tab in the left panel of this page.

  4. Once you have your Job/Route prepared in your Studio, you need to Publish it into the Nexus instance on http(s)://{TACHost}:8081/nexus, so you can use the same artifact for this demo. Artifact information assumption once published:
    • JobName: testRest
    • Feature version:0.1.0-SNAPSHOT
    • Nexus repo: Snapshots/org.example

Enabling Talend Runtime on EC2 configuration

Launching an EC2 instance

Procedure

  1. Connect to the EC2 console in any region and click on Launch Instance.
  2. Choose AMI: Microsoft Windows Server 2012 Base for this demo. Any Talend supported OS type can be used in this demo.

    For further information, see Compatible Operating Systems.

  3. Select m4.large (or m4.xlarge if a more performing machine is needed) and configure Instance details:
    1. Network: Choose your VPC (use default if you don’t have specific VPC configured)
    2. Subnet: No preference
    3. Auto-assign Public IP: Disable (you will configure Elastic IPs later to use static IP address)
    4. Other options as default
  4. Add Storage: use Size 50 GiB and General Purpose SSD Volume Type (give you enough disk space for OS and Talend installation).
  5. Add Tag: Key:NameValue:TalendRuntimeAutoScaling-Runtime
  6. Configure Security Group: Create a new Security Group as below (only for this demo, you should have more strict rules when using in real case).
    1. Inbound:
    2. Outbound:
  7. Launch the EC2 instance once all configured.
  8. (optional) Once instance created in EC2 dashboard, on AWS EC2 interface select Elastic IPs option to configure static IP address for Runtime EC2 instance.
  9. Select Allocate new address and then choose the new created Runtime EC2 Instance Id, once configured, you will find similar allocation information as shown below:

Installing Talend Runtime

Procedure

  1. Please follow the Talend installation guide to install/configure your Talend runtime server (either with installer or manual install). In this demo, the runtime has been installed under the path “/app/Talend-Runtime-V6.3.1” as shown below:
  2. Install Talend runtime as a service and install the Talend runtime wrapper, so you can have Linux service script for starting/stopping Runtime service.

    To install the Talend runtime wrapper, see Talend Runtime.

Talend Metaservlet API to be used at Talend Runtime host setup

In this article, you will need to use Talend metaservlet API call as shown below, so that AWS Auto-scaling could allow Talend Runtime host to interact automatically with Talend TAC host without any manual configuration:

  • "addServer": Declare Talend Runtime server to TAC.
  • "saveEsbTask": Create a new ESB task (service) in TAC/ESB conductor.
  • "requestDeployEsbTask": Request Talend Runtime to deploy the ESB task previously created in "saveEsbTask".
  • "requestUndeployEsbTask": Request Talend Runtime to undeploy ESB task specified in script.
  • "deleteEsbTask": Delete undeployed ESB task
  • - "removeServer": Remove Talend Runtime server declaration from TAC.

For complete Talend metaservlet API details, please refer to:Talend Administration Center MetaServlet API commands.

Building a Linux shell script to convert clear text to Base64 encoding string

Talend MetaServlet string need to be encoded to Base64 when requesting TAC MetaServlet REST API.

To post a JSON script, take metaservlet - “addServer” as an example, as shown below. This JSON sample adds a new Jobserver/Runtime server into TAC server page.

{"actionName": "addServer","adminConsolePort": 8040,"authPass": "admin","authUser":"admin@company.com",
"commandPort": 8000,"description": "RemoteRT", "filePort": 8001,"host": "34.251.88.225",
"instance": "trun", "label": "Remote RT server auto-scaling","mgmtRegPort": 1099,
"mgmtServerPort": 44444,"monitoringPort": 8888,"runtimePassword": "tadmin",
"runtimeUsername": "tadmin","shutdownBehavior": "Stop",
"timeoutUnknownState": "120","useSSL": false}

Procedure

  1. Send this JSON script to the TAC metaservlet API url. The REST WS call should be as follow:
    http://{TACHost}:8080/org.talend.administrator/metaServlet? {"actionName": "addServer",
    "adminConsolePort": 8040,"authPass":  "admin","authUser": "admin@company.com",
    "commandPort": 8000,"description": "RemoteRT","filePort": 8001,
    "host": "34.251.88.225","instance": "trun","label": "Remote RT server auto-scaling",
    "mgmtRegPort": 1099,"mgmtServerPort": 44444,"monitoringPort": 8888,"runtimePassword":"tadmin",
    "runtimeUsername": "tadmin","shutdownBehavior": "Stop","timeoutUnknownState": "120","useSSL": false}
  2. However, the JSON string provided to the API url need to be encoded within Base64, so the final REST WS call will be as shown below:
    http://{TACHost}:8080/org.talend.administrator/metaServlet?eyJhY3Rpb25OYW1lI
    jogImFkZFNlcnZlciIsImFkbWluQ29uc29sZVBvcnQiOiA4MDQwLCJhdXRoUGFzcyI6ICJhZG1pbiIsImF1dGhVc2VyI
    jogImFkbWluQGNvbXBhbnkuY29tIiwiY29tbWFuZFBvcnQiOiA4MDAwLCJkZXNjcmlwdGlvbiI6ICJSZW1vdGVSVCIsI
    mZpbGVQb3J0IjogODAwMSwiaG9zdCI6ICIzNC4yNTEuODguMjI1IiwiaW5zdGFuY2UiOiAidHJ1biIsImxhYmVsIjogI
    lJlbW90ZSBSVCBzZXJ2ZXIgYXV0by1zY2FsaW5nIiwibWdtdFJlZ1BvcnQiOiAxMDk5LCJtZ210U2VydmVyUG9ydCI6I
    DQ0NDQ0LCJtb25pdG9yaW5nUG9ydCI6IDg4ODgsInJ1bnRpbWVQYXNzd29yZCI6ICJ0YWRtaW4iLCJydW50aW1lVXNlcm
    5hbWUiOiAidGFkbWluIiwic2h1dGRvd25CZWhhdmlvciI6ICJTdG9
    wIiwidGltZW91dFVua25vd25TdGF0ZSI6ICIxMjAiLCJ1c2VTU0wiOiBmYWxzZX0
  3. Encode the string by using the Linux shell script example shown below. Save the script as base64url.sh file under /app repository (or at any location of your preference).
    #! /bin/bash
                            
    table=(A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 - _ =)
                            
    #text="{\"alg\":\"RS256\",\"typ\":\"JWT\"}"
    #text="{\"iss\":\"761326798069-r5mljlln1rd4lrbhg75efgigp36m78j5@developer.gserviceaccount.com\",\"scope\":\"https://www.googleapis.com/auth/prediction\",\"aud\":\"https://accounts.google.com/o/oauth2/token\",\"exp\":1328554385,\"iat\":1328550785}"
                            
    read text
                            
    # breaking binary represantation of $text in to 6-bit blocks
    for line in $(echo -n "$text" | xxd -b -g 0 | cut -d ' ' -f 2 | paste -s -d '' | fold -w 6 -s)
    do
        length=$(echo ${#line})
        let padding=6-$length
        s="0"
        p=""
                            
        # padding 6-bit to 8-bit by adding zeros to the right
        if (($padding > 0)); then
            while ((${#p} < "$padding")); do
                p="$p$s";
            done;
        fi
                            
        # convert binary to decimal
        n=$(echo "ibase=2;$line$p" | bc)
        # output the table looked up character
        echo -n ${table[n]}
    done
                        

Error messages when testing the base64url.sh

xxd: command not found

Depending on your Linux version, you might see the below error messages during testing the base64url.sh:

If you encounter the following error message:

Install the "vim-common" package as shown below:

See more details about "vim-common" on https://stackoverflow.com/questions/36179338/official-fedora-package-for-xxd-command.

bc: command not found

If you encounter the following error message:

Install “bc” as shown below:

See more details for “bc” on: http://thelinuxfaq.com/159-bc-command-not-found-in-centos-rhel-fedora-ubuntu

Building a Linux service script to call TAC Metaservlet APIs

Buidling a start() script

In this section, you will need to write the scripts below to build a Linux service script TALEND-INIT -including start() and stop() scripts.

Main steps in start() section:

Procedure

  1. Write the addServer script below to declare Talend Runtime server to TAC:
    
    echo $"Starting Talend Runtime Registration ...."
    # Read current Runtime host by using AWS instance Metadata API call, returned host address will be returned in system variable “host_address”.
    host_address=$(curl http://169.254.169.254/latest/meta-data/public-hostname)
                        
    # Build MetaServlet JSON script – addServer.
    metaservlet='{"actionName": "addServer","adminConsolePort": 8040,"authPass": "admin","authUser": "admin@company.com","commandPort": 8000,"description": "RemoteRT","filePort": 8001,"host": "'$host_address'","instance": "trun","label": "RemoteRT_'$host_address'","mgmtRegPort": 1099,"mgmtServerPort": 44444,"monitoringPort": 8888,"runtimePassword": "tadmin","runtimeUsername": "tadmin","shutdownBehavior": "Stop","timeoutUnknownState": "120","useSSL": false}'
                        
    # Run base64url.sh to encode “addServer” script within Base64.
    metaservlet=`echo "$metaservlet" | /app/base64url.sh`
                        
    # Execute REST Call to TAC MetaServlet with encoded string and extract server ID from JSON response.  
    serverId=$(curl -s 'http://{TACHost}:8080/org.talend.administrator/metaServlet?' -X POST -H 'Content-Type: application/json' -d "$metaservlet" | python -c 'import json,sys;obj=json.load(sys.stdin);print obj["id"]';)
    
    # Export the server ID as Linux environment variables so it can be used by later process.
    echo 'serverId='$serverId > /etc/environment
                    
  2. Write the saveEsbTask script below to create a new ESB task (service) in TAC/ESB Conductor.
    
    echo $"Creating new ESB Task ...."
    # Build MetaServlet JSON script – saveEsbTask.
    metaservlet='{"actionName": "saveEsbTask","authPass": "admin","authUser": "admin@company.com","description": "esbTask from '$host_address'","featureName": "testREST-feature","featureType": "SERVICE","featureUrl": "mvn:org.example/testREST-feature/0.1.0-SNAPSHOT/xml","featureVersion": "0.1.0-SNAPSHOT","repository": "snapshots","runtimeContext": "Default","runtimePropertyId": "testREST","runtimeServerName": "RemoteRT_'$host_address'","tag": "tag1","taskName": "ESBTaskMetaservlet_'$host_address'"}'
                        
    # Run base64url.sh to encode “saveEsbTask” script within Base64.
    metaservlet=`echo "$metaservlet" | /app/base64url.sh`
                        
    # Execute REST Call to TAC MetaServlet with encoded string and extract esbtask ID from JSON response.
    esbtaskId=$(curl -s 'http://{TACHost}:8080/org.talend.administrator/metaServlet?' -X POST -H 'Content-Type: application/json' -d "$metaservlet" | python -c 'import json,sys;obj=json.load(sys.stdin);print obj["taskId"]';)
                        
    # Export the ESB Task ID as Linux environment variables so it can be used by later process.
    echo 'esbtaskId='$esbtaskId >> /etc/environment
                    
  3. Write the requestDeployEsbTask script below to request Talend Runtime to deploy the ESB task previously created in "saveEsbTask".
    echo $"Deploying new ESB Task ...."
    # Build MetaServlet JSON script – requestDeployEsbTask.
    metaservlet='{"actionName": "requestDeployEsbTask","authPass": "admin","authUser": "admin@company.com","taskId": '$esbtaskId'}'
                        
    # Run base64url.sh to encode “requestDeployEsbTask” script within Base64.
    metaservlet=`echo "$metaservlet" | /app/base64url.sh`
                        
    # Execute REST Call to TAC MetaServlet with encoded string.
    curl -s 'http://{TACHost}:8080/org.talend.administrator/metaServlet?' -X POST -H 'Content-Type: application/json' -d "$metaservlet"
                    

Building a stop() script

Procedure

  1. Write the requestUndeployEsbTask script below to request Talend Runtime to undeploy ESB task specified in script:
    
    echo $"Un-deploying Talend ESB Task ...."
    # Read environment variable $esbtaskId.
    source /etc/environment
                        
    # Build MetaServlet JSON script – requestUndeployEsbTask.
    metaservlet='{"actionName": "requestUndeployEsbTask","authPass": "admin","authUser": "admin@company.com","taskId": '$esbtaskId'}'
                        
    # Run base64url.sh to encode “requestDeployEsbTask” script within Base64.
    metaservlet=`echo "$metaservlet" | /app/base64url.sh`
                        
    # Execute REST Call to TAC MetaServlet with encoded string.
    curl -s 'http://{TACHost}:8080/org.talend.administrator/metaServlet?' -X POST -H 'Content-Type: application/json' -d "$metaservlet"
                    
  2. Write the deleteEsbTask script below to delete undeployed ESB tasks.
    
    echo $"Deleting Talend ESB Task ...."
                        
    # Build MetaServlet JSON script – deleteEsbTask, with $esbtaskId previously reading from environment variables
    metaservlet='{"actionName": "deleteEsbTask","authPass": "admin","authUser": "admin@company.com","taskId": '$esbtaskId'}'
                        
    # Run base64url.sh to encode “requestDeployEsbTask” script within Base64.
    metaservlet=`echo "$metaservlet" | /app/base64url.sh`
                        
    # Execute REST Call to TAC MetaServlet with encoded string.
    curl -s 'http://{TACHost}:8080/org.talend.administrator/metaServlet?' -X POST -H 'Content-Type: application/json' -d "$metaservlet"
                    
  3. Write the removeServer script below to remove Talend Runtime server declaration from TAC:
    echo $"Remove Talend Runtime server from TAC ...."
                            
    # Build MetaServlet JSON script – removeServer, with $serverId previously reading from environment variables
    metaservlet='{"actionName": "removeServer","authPass": "admin","authUser": "admin@company.com","serverId": '$serverId'}'
                            
    # Run base64url.sh to encode “requestDeployEsbTask” script within Base64.
    metaservlet=`echo "$metaservlet" | /app/base64url.sh`
                            
    # Execute REST Call to TAC MetaServlet with encoded string.
    curl -s 'http://{TACHost}:8080/org.talend.administrator/metaServlet?' -X POST -H 'Content-Type: application/json' -d "$metaservlet"
                        

Executing a Linux service script

Retrieve the complete service script TALEND-INIT file from the Downloads tab in the left panel of this page.

Once downloaded, please copy this file into your Runtime host server under /etc/init.d repository.

Procedure

  1. Run the command below to grant execution permission.
    chmod 755 TALEND-INIT
  2. Activate “TALEND-INIT” as a Linux service.
    chkconfig TALEND-INIT on
  3. Change the “TALEND-INIT” chkconfig level to 123456 (to allow Linux reboot, shutdown, etc to use this script) Linux Runlevels Explained
    chkconfig --level 123456 TALEND-INIT on
  4. You may also test the service script by running the command below:
    service TALEND-INIT start

Starting the TALEND-INIT script

Once the TALEND-INIT script started, you will see (in order) the following changes:
  1. A new Talend Runtime Server has been added to TAC/servers declaration page (completed by the “addServer” metaservlet):
  2. A new ESB task has been created in the TAC/ESB Conductor tab and also has been deployed to new Talend Runtime server (completed by the “saveEsbTask” and “requestDeployEsbTask” metaservlets):