Architecture - Talend Data Management Platform 6.1

EnrichVersion
6.1
EnrichProdName
Talend Data Management Platform
task
Installation and Upgrade
Administration and Monitoring
Deployment
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Administration Center
Talend Runtime
Talend Activity Monitoring Console
Talend DQ Portal
Talend JobServer
Talend Studio
Talend CommandLine
Talend Log Server
Talend Artifact Repository

Architecture - Talend Data Management Platform 6.1

This article describes the logical and physical architecture of Talend Platform 6.1 products. This Reference Architecture  provides a high level overview and useful guidance on their installation and use.

The Talend Platform products are defined as follows:

  • Talend Data Management Platform
    • Talend Data Integration (DI)
    • Talend Data Quality (DQ)
    • Talend Data Stewardship Console (DSC)
    • Talend Data Quality (TDM)
    • Talend Integration Cloud (TIC)
    • Meta Integration Model Bridge (MIMB)
  • Talend Data Services Platform

    • Talend Data Management Platform +
    • Talend ESB (Enterprise Service Bus)
  • Talend Big Data Platform
    • Talend Data Management Platform +
    • Talend Big Data (BD)
  • Talend Real-Time Big Data Platform
    • Talend Data Services Platform +
    • Talend Big Data (BD)
    • Big Data Services (BDS)
  • Talend MDM Platform
    • Talend Data Services Platform +
    • Talend MDM (Master Data Management)
  • Talend Data Fabric
    • Talend Real-Time Big Data Platform +
    • Talend MDM (Master Data Management)

This article will focus on the Talend Data Management Platform only.

A Reference Architecture identifies important elements involved in designing information management systems. Working from a reference architecture can increase likelihood of achieving a project's critical success factors. The information below provides generalized designs. Your specific use case and environment may be different; however this general understanding should provide a starting point from which to adapt.

Logical Architecture for Data Integration

The figure below shows the logical architecture for the Talend Data Management Platform, which provides a rich set of data integration and data quality features.
Key capabilities include:
  • Connects to more than 1,000+ data sources and applications, thereby minimizing coding
  • Facilitates team collaboration with a shared repository
  • Simplifies deployment and tuning using advanced management and monitoring tools

Talend Software Components

Talend Data Management is bundled with the following software components:
  • Talend Studio
  • Talend Administration Center
  • SubVersion Projects
  • Git Projects
  • Talend CommandLine
  • Talend JobServer
  • Talend Runtime (Runtime contains a JobServer)
  • Talend Activity Monitoring Console (AMC)
  • Talend Log Server
  • Talend Artifact Repository (Nexus)
  • Talend DQ Portal
  • Talend Data Stewardship (DSC)

Talend Studio

The Talend Studio is used to develop and build data integration jobs.

The key features are:
  • Business oriented process modelling
  • Graphical Development
  • Broadest Connectivity (900+ components)
  • Real-time debugging
  • Auto Documentation
  • Shared Artifact Repository
  • Monitoring of processes
  • Data profiling and matching

The Talend Studio uses perspectives to focus developers on various tasks. Perspectives present a set of features relevant to the current task. Features are activated by the proper license.

The perspectives are:

  • Integration
  • Mediation (Routing)
  • MDM
  • BPM
  • Profiling
  • Talend Data Mapper
  • Component Designer

The studio is based on Eclipse 4.4 RCP (Rich Client Platform). Only Eclipse plugins allowed by the Talend license can be used within the Talend Studio. All features are license activated.

Talend Administration Center

The Talend Administration Center is a server component that enables the following:
  • Environment Configuration
  • User/Role Administration
  • Project Administration
  • Authorization
  • Tasks Scheduling and Execution (Job Conductor)
  • Monitoring
  • Recovery & Restart of Tasks

The Talend Administration Center (also commonly referred to as TAC) is a web application that can be hosted on Tomcat, JBoss and/or Weblogic. It is a fully compliant web application and comes packaged as a war file. It also controls the access to other Talend applications like the Talend Activity Monitoring Console, Drools Guvnor, Kibana, Talend Artifact Repository and others.

In general, only one Talend Administration Center is needed per Talend environment.

The Talend Administration Center will maintain Admin Metadata (users, project definition, authorization, scheduler tasks, configuration, etc.) within a database. This database is solely accessed by the Talend Administration Center. The database is generally small in size (less than 1 GB) even with thousands of tasks running on the Job Conductor.

The supported databases for the Talend Administration Center are H2, MySQL, Oracle, SQL Server and PostgreSQL.

For more information about the supported and recommended versions of these databases, see the Talend Data Fabric Installation Guide.

SubVersion

Talend uses Subversion as a repository for Talend projects. Subversion is a software versioning and revision control system from Apache. It is distributed as free software under the Apache license. Some common distributions are:

Talend stores Jobs, connections, schema definitions, custom jars, third party libraries, and properties files in Subversion. It also provides a versioning system for Talend projects and artifacts within the projects. Developers can transparently use common functions Get, Checkout, Commit without even knowing it.

Generally, Subversion is only needed in the development environment. The only exception is when using MDM, where a Subversion instance is needed in test and production environments to properly deploy MDM artifacts.

Note that Talend does not support the SVN Merge functionality. Hence, changes are overwritten once the user decides which version to keep.

Git Projects

Git is used with Drools to store rules. This usage of Git allows you to take full advantage of features including versioning, branching, and cloning repositories. Talend supports a Git backend with the same workflow as for SVN.

For a best practices guide on using Git with Talend, see also Best Practices: Using Git with Talend.

CommandLine

The CommandLine is a server component that is an exact copy of the Talend Studio running in a headless non-GUI mode. It is a key component to perform continuous integration with Talend. The CommandLine supports several modes: Server, Interactive Shell and Scripting.

The CommandLine primary purpose is to generate Java code, compile and package job binaries for deployment onto the Job Conductor within the Talend Administration Center. It is always invoked through commands which can be sent to it. The same commands can be used as part of a shell scripting approach.

The CommandLine generally runs as a service on the same server as Talend Administration Center and/or the CI Environment (for example Jenkins, Bamboo, etc.)

Talend JobServer

The Talend JobServer is a lightweight agent used for execution and monitoring of Talend tasks deployed through the Talend Administration Center Job Conductor. It can also be used by Talend Studio users through the Distant Run function.

The Talend JobServer is a server component that runs as a service. There are no license restrictions on the number of JobServers that a customer can install. The Talend JobServer also monitors the server health (CPU, RAM, Disk Usage).

Talend Runtime

Talend Runtime is an OSGi container based on Apache Karaf project. It allows you to deploy and execute various components and applications.

It can be used to deploy and execute all the services, routes and generic OSGi features created by the Talend Studio.

It provides the following features:

  • Embark a JobServer agent for the execution of DI tasks
  • Administration and monitoring via jmx
  • Control of container via direct shell, ssh or web console

It is recommended to install and configure the Talend Runtime instead of the JobServer agent if there are requirements to build services and routes. However, sometimes it may be preferred to use JobServer agent and Talend Runtime on different execution servers for separation of concerns.

Talend Activity Monitoring Console (AMC)

The Talend Activity Monitoring Console is a set of features that display information about the execution of each task. It is used in conjunction with a database consisting of 3 database tables or 3 files on disk (stats, logs and flow meter). The schema of each table/file can be extended to add more columns for extra information. Talend Jobs, if configured, will write to the Talend Activity Monitoring Console tables, and the information can then be accessed through the Talend Studio or the Talend Administration Center.

The volume of data stored in the database tables or files is directly related to the number of tasks and their frequency of execution. Developers must design additional jobs to manage the size and to perform archiving for the data within these 3 tables. Additional indexes can be added to the 3 tables to enhance their performance.

For more information about the compatible databases, see the Talend Data Fabric Installation Guide.

Talend Log Server

The Talend Log Server is based on ElasticSearch and LogStash (http://logstash.net/). It is used to streamline the capture and storage of logs from Talend Administration Center, MDM Server, ESB Server and Tasks running through the Job Conductor.

The Talend Log Server runs as a service, generally on the same server as the Talend Administration Center. The Kibana UI in the Talend Administration Center connects to ElasticSearch and enables the administrator/user to query and search the logs.

Artifact Repository

The Artifact Repository is a Nexus OSS bundled with the Talend product. It is used for the following:

  • Receive and store patches from Talend Website for deployment
  • Store published artifacts (Jobs, Services, Routes) from the Talend Studio
  • Store third party libraries needed by the Talend Studio and CommandLine

Technical Details:

  • It is a server component and runs as a service.
  • It is a web application and is accessed via a web browser.
  • There may be one or many instances of Nexus. Generally, it is recommended to have just one instance.
  • One instance of Nexus can manage several repositories. At a minimum we need two repositories: snapshots and releases repositories.
  • It may be used to proxy public Maven repositories for third party Java libraries.

Talend Data Quality Portal and Data Quality Data Mart

Talend DQ Portal

The Talend DQ Portal allows business users to view Data Quality reports and dashboards via a web interface.

Technical Details:

  • Server component
  • It is a web application and is accessed via a web browser.
  • There can be many instances of the Talend DQ Portal, depending on business requirements.
  • Hosted on Tomcat only (can be same Tomcat as Talend Administration Center)
  • Talend Installer can install the Talend DQ Portal and Tomcat (v6) together.
  • In addition to the relational database for the report data, an HSQL database is used for environment / user management.
  • The web application server is typically run as a Service / Daemon.

Data Quality Data Mart

The Talend DQ Data Mart is a database that holds the results of the execution of data quality reports. A data quality report can be executed directly from the Talend Studio Profiling perspective, or within a data integration Job that executes a specified report.

Technical Details:

  • Server component
  • Only MySQL and Oracle databases are supported.
  • Small – Medium size database: does not hold any actual source data.
  • Evolutionary Reports: all results from all report runs
  • Basic reports: last run of report

Talend Data Stewardship Console (DSC)

The Talend Data Stewardship Console (DSC) provides a web user interface for Data Stewards to resolve issues with records and possible matches when doing data matching. The DSC can be installed standalone or as part of a Talend MDM installation. It is used by MDM for integrated and complex matching.

Technical Details:

  • Server component
  • It is a web application and is accessed via a web browser.
  • Hosted on Tomcat if standalone, or JBoss if with MDM.
  • If installed with MDM, it uses the MDM authentication system and is a child application of Talend MDM Web User Interface.
  • If installed standalone, it provides basic user management via a file.
  • Talend Installer can install DSC with Talend Administration Center or MDM.
  • The web application server is typically run as a Service / Daemon.

Physical Architecture For Data Integration

Talend recommends that customers plan for at least 2 environments: Development and Production. For larger installations, Test and User Acceptance Test environments are also recommended.

An architecture team must develop a plan for each of these environments, based on the functional requirements of the intended business use and characteristics of the planned Talend project(s). The plan should define suitable infrastructure and sizing for these environments. To assist with this architecture planning, some typical designs are described below. A few assumptions are necessary. There are many variables to consider, so it is impossible to provide a simple formula. Instead, the following examples provide general guidance based upon some realistic assumptions for a Small, Medium, or Large installation.

Talend Reference Architecture v6.1

Basic assumptions

  SMALL MEDIUM LARGE
USERS 3-5 10-18 20+
ENVIRONMENTS DEV-TEST

PROD

DEV

TEST

PROD

DEV

TEST

UAT

PROD

PROJECTS 1-2 2-4 5+
JOBS 5-20+ 30-50+ 100+
DATA 100Mb-2Gb+ 5Gb-100Gb+ 250Gb+
SOURCES 1-3 5-8 10+
SOURCE/TARGET

TYPES

DB

FILE

DB

FILE

BD

DB

FILE

BD

STREAM

TARGETS 5+ 10+ 25+
LOAD WEEKLY DAILY HOURLY

Typical SMALL Reference Architecture

DEV-TEST

A typical shared Development and Test environment is shown in the diagram below. This architecture is recommended for a small team, or where a reduced hardware infrastructure is necessary.

In this example, additional Execution Server(s) may be added if there are many tasks (> 25) being scheduled at the same time, or if some Jobs require significant CPU, memory, or disk resources.

Refer to the Talend Data Fabric Installation Guide for details on supported OS, Java, Database Engines, and minimum processor, memory, and disk requirements. Servers can be either Virtual or Bare Metal depending upon company policy. Performance differences may be experienced between these options. That is, a shared VM may not perform as well as a dedicated VM or Bare Metal.

Because this architecture supports both Development and Testing operations on the same infrastructure, great care is recommended on its use and management. In particular, having a clearly defined SDLC Practice and use of Context Variables is highly encouraged. (Context Variables should define their context groups as environments DEV, TEST, and PROD separately).

For more information about SDLC Practice, see the Talend SDLC Best Practices Guide.

For more information about how to use contexts and variables, see the Talend Data Fabric User Guide.

Workstation/Server Role Description Typical Sizing

Developer Workstation

The Talend Studio installed on a developer workstation must have network permissions to the Talend Administration Center and CI Server. The Studio does not necessarily need connectivity to the Execution Server, if security policies restrict access. Limiting access to the DEV-TEST environment may not be optimal, because it would eliminate the opportunity to run jobs remotely.

OS: Windows/Linux/MacOS

CPU: 4 Cores Minimum

RAM: 8 GB Minimum, 16 GB Recommended

Disk Size: 512+ MB Recommended

Execution Server

This is where all Talend jobs that are deployed as tasks on the Job Conductor in the Talend Administration Center will be executed for testing purposes. Remote job execution from a developer workstation allows direct testing without having to create a scheduled task in the Talend Administration Center.

OS: Windows/Linux

CPU: 4 Cores Minimum, 8 Cores recommended

RAM: 2 GB RAM Minimum, 16+ GB Recommended

Disk Size: 100+ GB

Talend Administration Center & CI Server

The Talend Administration Center (TAC) server and the Continuous Integration (CI) server can reside on the same box or be separated onto 2 separate boxes, depending upon utilization needs.

It is possible to include the SVN Repository and/or Database Engine on this server; however, ensure that the CPU, RAM, and DISK are sufficient to accommodate all of these software installations.

Talend Administration Center includes: Java 1.8 JDK; Tomcat Web UI; AMC; Kibana; Log Server

CI includes: SVN; Nexus; Command Line; CI tools (Jenkins and/or Maven); CI-Builder

Required and optional databases are shown above. These may be installed here as well, if sufficient CPU, RAM, and DISK are available. Otherwise, an additional database server may be needed.

OS: Windows/Linux

CPU: 4 Cores Minimum

RAM: 6 GB RAM Minimum, 16 GB Recommended

Disk Size: 1+ TB Minimum (for software, logs, & projects)

PRODUCTION

A typical Production environment is shown in the diagram below. This architecture is recommended for a small project with few Jobs, or where a reduced hardware infrastructure is necessary.

In this example, two Execution Servers distribute the load based upon CPU, memory, and disk resource requirements. More Execution Servers may be added.

Refer to the Talend Data Fabric Installation Guide for details on supported OS, Java, Database Engines, and minimum processor, memory, and disk requirements. Servers can be either Virtual or Bare Metal depending upon company policy. Performance differences may be experienced between these options. That is, a shared VM may not perform as well as a dedicated VM or Bare Metal.

Workstation/Server Role Description Typical Sizing
Execution Server

This is where all Production-ready Talend jobs are deployed as tasks using the Job Conductor in the Talend Administration Center. Talend Studio remote Job execution is highly discouraged and not shown.

OS: Windows/Linux

CPU: 4 Cores Minimum, 8 Cores recommended

RAM: 4 GB RAM Minimum, 16+ GB Recommended

Disk Size: 100+ GB

Talend Administration Center & CI Server

The Talend Administration Center (TAC) server and the Continuous Integration (CI) server can reside on the same box or be separated onto two separate boxes.

The SVN Server for Project and Library repositories may be installed on this server, if sufficient CPU, RAM, and DISK are available. Otherwise, an additional server for CI may be needed.

The required and optional databases shown above may be installed on this server, if sufficient CPU, RAM, and DISK are available. Otherwise, an additional database server may be needed.

Talend Administration Center includes: Java 1.8 JDK; Tomcat Web UI; AMC; Kibana; Log Server

CI includes: SVN; Nexus; Command Line; CI tools (Jenkins and/or Maven); CI-Builder

Note: The Audit database is generally not installed on the Production environment.

OS: Windows/Linux

CPU: 4 Cores Minimum

RAM: 8 GB RAM Minimum, 16+ GB Recommended

Disk Size: 1+ TB Minimum (for software, logs & projects)

Typical MEDIUM Reference Architecture

DEVELOPMENT

A typical Development environment is shown in the diagram below. This architecture is recommended for a medium team, or where a separate Test environment should be established.

In this example, the CI Server is shared with the Test environment to reduce the hardware footprint. This assumes that the medium-sized infrastructure can support the teams' load.

Additional Execution Server(s) may be added if there are many tasks (> 25) being scheduled at the same time, or if some Jobs that require significant CPU, memory, or disk resource.

Refer to the Talend Data Fabric Installation Guide for details on supported OS, Java, Database Engines, and minimum processor, memory, and disk requirements. Servers can be either Virtual or Bare Metal depending upon company policy. Performance differences may be experienced between these options. That is, a shared VM may not perform as well as a dedicated VM or Bare Metal.

Workstation/Server Role Description Typical Sizing

CI Server

Because the Continuous Integration (CI) Server in this example is shared with the Test environment, careful sizing is necessary to ensure both environments are properly supported. If the shared CI Server utilization creates a bottleneck, a separate CI server should be established for the Test environment.

CI includes: SVN; Nexus; CommandLine; CI tools (Jenkins and/or Maven); CI-Builder

Note:

This configuration supports the best practices highlighted in the Talend SDLC Best Practices Guide for continuous integration and continuous deployment.

OS: Windows/Linux

CPU: 4 Cores Minimum

RAM: 6 GB RAM Minimum, 16 GB Recommended

Disk Size: 1+ TB Minimum (for project code)

Developer Workstation

The Talend Studio installed on a developer workstation must have network permissions to the Talend Administration Center and CI Servers. The Studio does not necessarily need connectivity to the Execution Server, if security policies restrict access. Limiting access in the DEV environment may not be optimal because it would eliminate the opportunity to run jobs remotely.

OS: Windows/Linux/MacOS

CPU: 4 Cores Minimum

RAM: 8 GB Minimum, 16 GB Recommended

Disk Size: 512+ MB Recommended

Execution Server(s)

This is where all Talend jobs that are deployed as tasks on the Job Conductor in the Talend Administration Center will be executed for development unit testing purposes. Remote job execution from a developer workstation allows direct testing without having to create a scheduled task in the Talend Administration Center.

OS: Windows/Linux

CPU: 4 Cores Minimum, 8 Cores recommended

RAM: 2 GB RAM Minimum, 16+ GB Recommended

Disk Size: 100+ GB

Talend Administration Center Server

This Talend Administration Center (TAC) server is dedicated for development purposes. It provides access to the CI Server, which maintains project code and executable objects.

The required and optional databases shown above may be installed on this server, if sufficient CPU, RAM, and DISK are available. Otherwise, an additional database server may be needed.

TAC includes: Java 1.8 JDK; Tomcat Web UI; AMC; Kibana; Log Server.

OS: Windows/Linux

CPU: 4 Cores Minimum

RAM: 6 GB RAM Minimum, 16 GB Recommended

Disk Size: 1+ TB Minimum (for software & logs)

TEST

A typical Test environment is shown in the diagram below. This architecture is recommended for a medium team, or where Development and Testing should be split across two environments.

In this example, the CI Server is shared with the Development environment as described above. An additional Execution Server(s) may be added and dedicated for testing purposes.

Refer to the Talend Data Fabric Installation Guide for details on supported OS, Java, Database Engines, and minimum processor, memory, and disk requirements. Servers can be either Virtual or Bare Metal depending upon company policy. Performance differences may be experienced between these options. That is, a shared VM may not perform as well as a dedicated VM or Bare Metal.

Workstation/Server Role Description Typical Sizing

CI Server

Because the Continuous Integration (CI) Server in this example is shared with the Development environment, careful sizing is necessary to ensure both environments are properly supported. If the shared CI Server utilization creates a bottleneck, a separate CI server should be established for the Dev environment.

CI includes: SVN; Nexus; CommandLine; CI tools (Jenkins and/or Maven); CI-Builder

Note:

This configuration supports the best practices highlighted in the Talend SDLC Best Practices Guide for continuous integration and continuous deployment.

OS: Windows/Linux

CPU: 4 Cores Minimum

RAM: 6 GB RAM Minimum, 16+ GB Recommended

Disk Size: 1+ TB Minimum (for project executable objects)

Execution Server(s)

This is where all Talend jobs that are deployed as tasks on the Job Conductor in the Talend Administration Center will be executed for full testing purposes. Remote job execution from a test workstation allows direct testing without having to create a scheduled task in the Talend Administration Center.

OS: Windows/Linux

CPU: 4 Cores Minimum, 8 Cores recommended

RAM: 2 GB RAM Minimum, 16+ GB Recommended

Disk Size: 100+ GB

Talend Administration Center Server

This Talend Administration Center (TAC) server is dedicated for testing purposes. It provides access to the CI Server which maintains project code and executable objects.

The required and optional databases shown above may be installed on this server, if sufficient CPU, RAM, and DISK are available. Otherwise, an additional database server may be needed.

Talend Administration Center includes: Java 1.8 JDK; Tomcat Web UI; AMC; Kibana; Log Server.

OS: Windows/Linux

CPU: 4 Cores Minimum

RAM: 6 GB RAM Minimum, 16 GB Recommended

Disk Size: 1+ TB Minimum (for software & logs)

Test Workstation

The Talend Studio installed on a test workstation must have network permissions to the Talend Administration Center and CI Servers. The Studio does not necessarily need connectivity to the Execution Server, if security policies restrict access. Limiting access in the TEST environment may not be optimal because it would eliminate the opportunity to run jobs remotely.

OS: Windows/Linux/MacOS

CPU: 4 Cores Minimum

RAM: 8 GB Minimum, 16 GB Recommended

Disk Size: 512+ MB Recommended

PRODUCTION

A typical Production environment is shown in the diagram below. This architecture is recommended for a medium project/Job base, or where a more significant hardware infrastructure is necessary.

In this example, two Execution Servers distribute the load based upon CPU, memory, and/or disk resource requirements. More Execution Servers may be added.

The CI Server shown here is separate from the shared CI Server used by the DEV and TEST environments.

Refer to the Talend Data Fabric Installation Guide for details on supported OS, Java, Database Engines, and minimum processor, memory, and disk requirements. Servers can be either Virtual or Bare Metal depending upon company policy. Performance differences may be experienced between these options. That is, a shared VM may not perform as well as a dedicated VM or Bare Metal.

Workstation/Server Role Description Typical Sizing

CI Server

The Continuous Integration (CI) Server in this example is dedicated to the Production environment, to isolate it from the DEV and TEST environments.

CI includes: SVN; Nexus; CommandLine; CI tools (Jenkins and/or Maven); CI-Builder

Note:

This configuration supports the best practices highlighted in the Talend SDLC Best Practices Guide for continuous integration and continuous deployment.

OS: Windows/Linux

CPU: 4 Cores Minimum

RAM: 8 GB RAM Minimum, 16+ GB Recommended

Disk Size: 1+ TB Minimum (for project executable objects)

Execution Server(s)

This is where all Production-ready Talend jobs are deployed as tasks using the Job Conductor in the Talend Administration Center. Talend Studio Remote job execution is highly discouraged and not shown.

OS: Windows/Linux

CPU: 4 Cores Minimum, 8 Cores recommended

RAM: 4 GB RAM Minimum, 16+ GB Recommended

Disk Size: 100+ GB

Talend Administration Center Server

This Talend Administration Center (TAC) server is dedicated for production purposes. It provides access to the CI Server which maintains project code and executable objects.

The required and optional databases shown above may be installed on this server, if sufficient CPU, RAM, and DISK are available. Otherwise, an additional database server may be needed.

Talend Administration Center includes: Java 1.8 JDK; Tomcat Web UI; AMC; Kibana; Log Server.

Note: The Audit database is generally not installed on the Production environment.

OS: Windows/Linux

CPU: 4 Cores Minimum

RAM: 8 GB RAM Minimum, 16+ GB Recommended

Disk Size: 1+ TB Minimum (for software & logs)

Typical LARGE Reference Architecture

DEVELOPMENT

A typical Development environment is shown in the diagram below. This architecture is recommended for a large team, or where separate Test, Acceptance Test (staging), and Production environments should be established.

In this example, each environment has its own CI Server and an automated SDLC promotion/publish process is established. At least two Execution Servers provide a minimum footprint for multiple developers running tasks at the same time, or if a particular Job requires significant CPU, memory, or disk resource. Developers can perform remote Job execution on either Execution Servers as needed.

For more information about how to deploy CI to QA and Production environments, see the Talend SDLC Best Practices Guide.

Provided the team uses Talend Data Quality and/or Talend Data Stewardship Console, the corresponding database for each can be placed on a different host for performance and manageability.

Refer to the Talend Data Fabric Installation Guide for details on supported OS, Java, Database Engines, and minimum processor, memory, and disk requirements. Servers can be either Virtual or Bare Metal depending upon company policy. Performance differences may be experienced between these options. That is, a shared VM may not perform as well as a dedicated VM or Bare Metal.

Workstation/Server Role Description Typical Sizing

CI Server

The Continuous Integration (CI) Server in this example is dedicated for the Development environment.

CI includes: SVN; Nexus; CommandLine; CI tools (Jenkins and/or Maven); CI-Builder

Note:

This configuration supports the best practices highlighted in the Talend SDLC Best Practices Guide for continuous integration and continuous deployment.

OS: Windows/Linux

CPU: 4 Cores Minimum

RAM: 6 GB RAM Minimum, 32 GB Recommended

Disk Size: 1+ TB Minimum (for project code)

Developer Workstation

The Talend Studio installed on a developer workstation must have network permissions to the Talend Administration Center and CI Servers. The Studio does not necessarily need connectivity to the Execution Server, if security policies restrict access. Limiting access in the DEV environment may not be optimal because it would eliminate the opportunity to run Jobs remotely.

OS: Windows/Linux/MacOS

CPU: 4 Cores Minimum

RAM: 8 GB Minimum, 16 GB Recommended

Disk Size: 512+ MB Recommended

Execution Server(s)

This is where all Talend jobs that are deployed as tasks on the Job Conductor in the Talend Administration Center will be executed for development unit testing purposes. Remote job execution from a developer workstation allows direct testing without having to create a scheduled task in the Talend Administration Center.

OS: Windows/Linux

CPU: 4 Cores Minimum, 8 Cores recommended

RAM: 2 GB RAM Minimum, 32+ GB Recommended

Disk Size: 250+ GB

Talend Administration Center Server

This Talend Administration Center (TAC) server is dedicated for development purposes. It provides access to the CI Server, which maintains project code and executable objects.

The required and optional databases shown above may be installed on this server, if sufficient CPU, RAM, and DISK are available. Otherwise, an additional database server may be needed.

Talend Administration Center includes: Java 1.8 JDK; Tomcat Web UI; AMC; Kibana; Log Server.

OS: Windows/Linux

CPU: 4 Cores Minimum

RAM: 6 GB RAM Minimum, 32 GB Recommended

Disk Size: 1+ TB Minimum (for software & logs)

TEST

A typical Test environment is shown in the diagram below. This architecture is recommended for a large team, or where separate Development, Acceptance Test (staging), and Production environments should be established.

In this example, the CI servers in each environment communicate using tools such as Jenkins or Nexus Artifact repository for an automated SDLC promotion/publish process. At least two Execution Servers provide a minimum footprint for multiple tests against developed Jobs. Talend QA engineers can perform remote Job execution on either Execution Servers as needed.

For more information about how to deploy CI to QA and Production environments, see the Talend SDLC Best Practices Guide.

Two Talend Administration Center servers are configured to provide High Availability (HA) on the Quartz Scheduler. This clustering is only available in the Talend Platform products. A shared drive is used between the two TACs for storing Job archives and logs that are generated by each task run. This allows both the Talend Administration Center servers to see exactly the same configuration.

The Talend Administration Center in the Test Environment will need access to the Nexus Snapshots and Releases repositories promoted from the Development CI server to the Test CI server. Tasks should be created in the Job Conductor using the Nexus deployment functionality (not the 'Normal Task' SVN repository used in the Development environment). The Job binaries or "artifacts" will be downloaded from the CI Server.

Refer to the Talend Data Fabric Installation Guide for details on supported OS, Java, Database Engines, and minimum processor, memory, and disk requirements. Servers can be either Virtual or Bare Metal depending upon company policy. Performance differences may be experienced between these options. That is, a shared VM may not perform as well as a dedicated VM or Bare Metal.

Workstation/Server Role Description Typical Sizing

CI Server

The Continuous Integration (CI) Server in this example is dedicated for the Test environment.

CI includes: SVN; Nexus; CommandLine; CI tools (Jenkins and/or Maven); CI-Builder

Note:

This configuration supports the best practices highlighted in the Talend SDLC Best Practices Guide for continuous integration and continuous deployment.

OS: Windows/Linux

CPU: 4 Cores Minimum

RAM: 6 GB RAM Minimum, 32 GB Recommended

Disk Size: 1+ TB Minimum (for project code)

Developer Workstation

The Talend Studio installed on a developer workstation must have network permissions to the Talend Administration Center and CI Servers. The Studio does not necessarily need connectivity to the Execution Server, if security policies restrict access. Limiting access in the TEST environment may not be optimal because it would eliminate the opportunity to run jobs remotely.

OS: Windows/Linux/MacOS

CPU: 4 Cores Minimum

RAM: 8 GB Minimum, 16 GB Recommended

Disk Size: 512+ MB Recommended

Execution Server(s)

This is where all Talend jobs that are deployed as tasks on the Job Conductor in the Talend Administration Center will be executed for development unit testing purposes. Remote Job execution from a developer workstation allows direct testing without having to create a scheduled task in the Talend Administration Center.

OS: Windows/Linux

CPU: 4 Cores Minimum, 8 Cores recommended

RAM: 2 GB RAM Minimum, 32+ GB Recommended

Disk Size: 250+ GB

Talend Administration Center Server

This Talend Administration Center (TAC) server is dedicated for Test purposes. It provides access to the CI Server, which maintains project code and executable objects.

The required and optional databases shown above may be installed on this server, if sufficient CPU, RAM, and DISK are available. Otherwise, an additional database server may be needed.

Talend Administration Center includes: Java 1.8 JDK; Tomcat Web UI; AMC; Kibana; Log Server.

OS: Windows/Linux

CPU: 4 Cores Minimum

RAM: 6 GB RAM Minimum, 32 GB Recommended

Disk Size: 1+ TB Minimum (for software & logs)

UAT

A typical User Acceptance Test (or Staging) environment is shown in the diagram below. This architecture is recommended for a large team, or where separate Development, Test, and Production environments should be established. The purpose of this environment is final validation of Job Artifacts (binary objects) slated for deployment to Production.

In this example, the CI server contains only release-candidate Job artifacts, tested and tagged, ready for deployment into production. The UAT environment is generally expected to be identical to the Production environment. At least two Execution Servers are configured as 'Virtual Servers' (available in the Platform products only) to distribute task utilization at run-time. Additional Execution Servers can be added to the 'Virtual Server' for scalability and/or dedicated use.

For more information about how to configure virtual servers, see the Talend Administration Center User Guide.

Like the Test environment, two Talend Administration Center servers are configured to provide High Availability (HA) on the Quartz Scheduler.

The Talend Administration Center in the UAT Environment will need access to the Nexus Releases repositories promoted from the Test CI server only. Tasks should be created in the Job Conductor using the Nexus deployment functionality (not the 'Normal Task' SVN repository used in the Development environment). The Job binaries or "artifacts" will be downloaded from the CI Server.

Refer to the Talend Data Fabric Installation Guide for details on supported OS, Java, Database Engines, and minimum processor, memory, and disk requirements. Servers can be either Virtual or Bare Metal depending upon company policy. Performance differences may be experienced between these options. That is, a shared VM may not perform as well as a dedicated VM or Bare Metal.

Workstation/Server Role Description Typical Sizing

CI Server

The Continuous Integration (CI) Server in this example is dedicated for the User Acceptance Test (or Staging) environment.

CI includes: SVN; Nexus; CommandLine; CI tools (Jenkins and/or Maven); CI-Builder.

Note:

This configuration supports the best practices highlighted in the Talend SDLC Best Practices Guide for continuous integration and continuous deployment.

OS: Windows/Linux

CPU: 4 Cores Minimum

RAM: 6 GB RAM Minimum, 16 GB Recommended

Disk Size: 1+ TB Minimum (for project code)

Execution Server(s)

This is where all Talend Jobs that are deployed as tasks on the Job Conductor in the Talend Administration Center will be executed for final job testing and validation purposes.

OS: Windows/Linux

CPU: 4 Cores Minimum, 8 Cores recommended

RAM: 2 GB RAM Minimum, 64+ GB Recommended

Disk Size: 250+ GB

Talend Administration Center Server

This Talend Administration Center (TAC) server is dedicated for UAT purposes. It provides access to the CI Server, which maintains project executable objects only.

The required and optional databases shown above may be installed on this server, if sufficient CPU, RAM, and DISK are available. Otherwise, an additional database server may be needed.

Talend Administration Center includes: Java 1.8 JDK; Tomcat Web UI; AMC; Kibana; Log Server.

OS: Windows/Linux

CPU: 4 Cores Minimum

RAM: 8 GB RAM Minimum, 32 GB Recommended

Disk Size: 1+ TB Minimum (for software & logs)

PRODUCTION

A typical Production environment is shown in the diagram below. This architecture is recommended for a large project/Job base, or where a highly scalable hardware infrastructure is necessary.

In this example, the CI server contains only Job artifacts released to production. "Virtual Servers" (available in the Platform products only) are utilized with shared disk to provide scalability without impact to Job tasks.

For more information about how to configure virtual servers, see the Talend Administration Center User Guide.

Like the UAT environment, two Talend Administration Center servers are configured to provide High Availability (HA) on the Quartz Scheduler.

The Talend Administration Center in the PROD Environment will need access to the Nexus Releases repositories promoted from the UAT CI server only. Tasks should be created in the Job Conductor using the Nexus deployment functionality. The Job binaries or "artifacts" will be downloaded from the CI Server.

Refer to the Talend Data Fabric Installation Guide for details on supported OS, Java, Database Engines, and minimum processor, memory, and disk requirements. Servers can be either Virtual or Bare Metal depending upon company policy. Performance differences may be experienced between these options. That is, a shared VM may not perform as well as a dedicated VM or Bare Metal.

Workstation/Server Role Description Typical Sizing

CI Server

The Continuous Integration (CI) Server in this example is dedicated for the Production environment.

CI includes: SVN; Nexus; CommandLine; CI tools (Jenkins and/or Maven); CI-Builder.

Note:

This configuration supports the best practices highlighted in the Talend SDLC Best Practices Guide for continuous integration and continuous deployment.

OS: Windows/Linux

CPU: 4 Cores Minimum

RAM: 6 GB RAM Minimum, 16 GB Recommended

Disk Size: 1+ TB Minimum (for project code)

Execution Server(s)

This is where all Talend Jobs that are deployed as tasks on the Job Conductor in the Talend Administration Center will be executed in 'Production' mode.

OS: Windows/Linux

CPU: 4 Cores Minimum, 8 Cores recommended

RAM: 2 GB RAM Minimum, 64+ GB Recommended

Disk Size: 250+ GB

Talend Administration Center Server

This Talend Administration Center (TAC) server is dedicated for PROD purposes. It provides access to the CI Server, which maintains project executable objects only.

The required and optional databases shown above may be installed on this server, if sufficient CPU, RAM, and DISK are available. Otherwise, an additional database server may be needed.

Talend Administration Center includes: Java 1.8 JDK; Tomcat Web UI; AMC; Kibana; Log Server.

OS: Windows/Linux

CPU: 4 Cores Minimum

RAM: 8 GB RAM Minimum, 32 GB Recommended

Disk Size: 1+ TB Minimum (for software & logs)

Physical Architecture For Data Quality

Talend recommends that the Data Quality and Data Stewardship Console environments are considered in the overall sizing of hardware and resources. These features provide web-based data quality profiling, monitoring, and governance for all Talend Platform products.

For more information, see the Talend DQ Portal User and Administrator Guide and the Talend Data Stewardship Console User Guide.

Data Quality/Data Stewardship Console

The diagram below shows the main components needed for data quality features of the platform. This architecture may be replicated in DEV, TEST, UAT, and PROD environments as needed. Each environment will follow the same architecture design. Include:
  • 2 Databases/Schemas (eg: DQP611 & DSC611)
  • 1 Server that will host the Talend DQ Portal and the Talend Data Stewardship Console web applications. The two web applications will need to be accessible to business users and data stewards. Hence, the security requirements on this server may be different, due to access requirements for users other than developers and administrators.
    • For a SMALL reference architecture this may be installed on the DEV-TEST and PROD servers without much impact.
    • For a MEDIUM reference architecture this may be installed on the DEV and TEST servers; however separate servers may be needed for UAT and PROD.
    • For a LARGE reference architecture this may need to be installed on separate servers for each of the DEV, TEST, UAT, and PROD environments.
Note:

Depending upon utilization of the Talend DQ Portal and Talend Data Stewardship Console in each environment where a shared server is involved, the sizing recommendations below may need to be accumulative to the guidelines described above.

Workstation/Server Role Description Typical Sizing
Talend DQ Portal

This server will host the Talend DQ Portal and the Talend Data Stewardship Console web applications.

OS: Windows/Linux (See Installation Guide)

CPU: 8 Cores Minimum

RAM: 16 GB RAM

Disk Size: 100 GB