Skip to main content Skip to complementary content

Deploy Talend JobServer on AWS EC2

As described in the architecture, a Talend JobServer will be used to execute the Talend Job which downloads CSV files from S3 to convert them into XML files.

Now let’s launch an EC2 instance and deploy Talend JobServer on AWS.

Procedure

  1. Connect to EC2 console in Ireland region.
  2. Click Launch instance.
  3. Choose AMI, Amazon Linux AMI 2018.03.0 (HVM) 64 bit is a good fit for this demo.
  4. Select t2.medium which is a good fit for this demo.
  5. Configure Instance Details.
    • Choose your default VPC.
    • Enable Auto-assign Public IP.
    • Keep other settings with default values.

    Note that you must define an IAM role with access to the S3 bucket if you want to use the Inherit credentials from AWS role option of the S3 Talend components. This option delegates the access to the role inheritance and thus you will not need a Secret Access Key. In this article, we are using an access key in our S3 components to keep it simple.

  6. Add Storage, Size = 16GiB.

    Note that since the S3 files are downloaded from S3 to the execution server, you should size the disk appropriately so that it can hold your S3 file input and the output file created by your Job(s). After uploading the output file to S3, we can design our DI Job(s) to delete all local files to clean up after the operation.

  7. Add Tags, Key=Name and Value=Talend JobServer.
  8. Configure Security Group. Create a new security group with the following rules:
    • Add an SSH rule with values:
      • Type = SSH
      • Protocol = TCP
      • Port Range = 22
      • Source = Custom 0.0.0.0/0
    • Add a custom TCP rule with values:
      • Type = Custom TCP Rule
      • Protocol = TCP
      • Port Range = 8000
      • Source = Custom <your TAC security group id>
    • Add a custom TCP rule with values:
      • Type = Custom TCP Rule
      • Protocol = TCP
      • Port range = 8001
      • Source = Custom <your TAC security group id>
    • Add a custom TCP rule with values:
      • Type = Custom TCP Rule
      • Protocol = TCP
      • Port range = 8888
      • Source = Custom <your TAC security group id>
  9. Review and Launch.

    Ignore the warning on security group for this time.

    Best practice is to avoid using 0.0.0.0/0. Instead, restrict the port to your corporate IP addresses.

  10. Launch the instance.
  11. Follow the section Installing and configuring your Talend JobServer in Talend Installation Guide to install and configure your Talend JobServer.
  12. Declare the Talend JobServer as an execution server in Talend Administration Center via the following steps.
    1. Connect to Talend Administration Center web interface with a web browser.
    2. Navigate to Conductor > Servers.
    3. Click Add > Add server.
    4. Use the settings as below to declare your Job server.
      • Label = job server
      • Host = <the private ip of the ec2 server hosting your job server>

        The private IP address can be found in EC2 console in the instance details.

        Using private IP address instead of Public IP is sufficient for Talend Administration Center EC2 to reach the Talend JobServer host since both EC2 hosts are located in the same default VPC.

      • Keep all other fields with default values
    5. Click Save. This should add the server to the list of servers as below:

    You have successfully installed Talend JobServer on AWS EC2 and declare it as an execution server in Talend Administration Center.

    In this example, we are using an always on execution server. We can also evolve this architecture to use an EC2 Server definition in Talend Administration Center. Refer to the documentation on how to add an EC2 execution server to Talend Administration Center. Talend Administration Center can start the execution server EC2 instance before executing the Job, and then shut down the EC2 instance when the Job has finished executing. You can add an EC2 Server definition per task so that an EC2 instance is started for each Job, and then shutdown. This provides an scalable architecture which adheres to AWS principles, i.e. only use resources when you need to compute data. When there is no file in S3 to process, there is no execution server running.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!