Best Practice: Talend ESB

EnrichVersion
6.4
6.3
6.2
6.1
6.0
5.6
EnrichProdName
Talend Open Studio for ESB
Talend ESB
Talend Data Fabric
Talend Real-Time Big Data Platform
Talend MDM Platform
Talend Data Services Platform
task
Design and Development
EnrichPlatform
Talend Studio
Talend ESB

Best Practice: Talend ESB

Introduction

The best practices described in this document are techniques that have consistently shown results superior to those achieved by other means. Talend recommends the use and adoption of the guidelines described in this document to expedite your development. The best practices can be used as a benchmark to measure quality and conformity of work produced by developers.

The best practices described in this document will undergo tweaks and addition with every new release of the Talend ESB product. 

The purpose of this document is to provide standards and best practices around the creation of mediation routes and web services. This is intended to be a working document that can be added to and removed from as and when best practices are updated or superseded. The standards in this document should be followed whenever a route/web service is being built. However there is some flexibility with regard to the best practices. It is advised that these be followed, but in some cases it may be beneficial to do something different. Developers should use the best practice section as a starting point but stick strictly to the standards.

Mediation Best Practices

This section describes the best practices that should be followed when building Mediation routes.

Readability

The following are suggestions for best practice in order to aid readability and reduce complexity.

Component Names

Do not leave component names unchanged. Always give them a useful name that indicates how they are being used.

While you are able to use the same name for several components this should not be done in most cases as it can lead to problems whereby components with the same name will overwrite each other’s settings.

Where a component is a pure copy of another with no changes then it is OK (and might make sense) to leave the names the same.

Generic Components

While it is possible to “recreate the wheel” by using generic components to do everything, it is recommended that dedicated components be used where possible.

For example, it is possible to recreate the functionality of the cTimer component using the cMessagingEndpoint component. This should not be done without a good reason. Using the dedicated components means that potential errors are more likely to be caught and the readability of the routes will be better as the images for these components can help identify their purpose.

Keep Complexity Low

While it is possible to achieve many things in one very long and complex route, it makes it very hard to read and leaves the route prone to unhandled errors. If your route becomes complex, try to break it down into several subroutes. Remember when doing this that you may need to keep control of the order of processing, so use suitable endpoints (“direct” for synchronous behavior and “seda” for asynchronous behavior).

Subroutes should be ordered in the order in which you expect them to be processed from top to bottom going from left to right. The main route (usually the route that receives the initial message) should be at the top of the route design.

Unconnected Components

Don’t scatter unconnected components such as cConfig and cJMSConnectionFactory components all over the place in your route design. Unconnected components that occur in all routes should be placed in the top left corner of the route design and route specific components should be placed just below these on the left. As people will generally try and read from left to right and from top to bottom this is natural place to position these key components.

Always Check the Camel Documentation First

When designing a route always make sure you check the Camel documentation (http://camel.apache.org/components.html) to find out what is available for you to use. In many cases this will save a lot of time, work and will provide better performance.

A developer wanted to move a file from an input folder to a working directory. He designed this step with a separate route and two cFile components. This approach had a couple of major drawbacks.

Instead of moving the file via a fast file system operation, the file was read into a stream (the first cFile component) and then written from that stream into a second file (the second cFile component). This basically meant making a copy of the file and then deleting the original.

A look into the cFile component documentation would have told the developer to use the preMove setting of the file component, which would have been much faster and the whole route design would have been much simpler.

Use Folders and Sensible Route Names

If you are creating a lot of routes it can get very confusing which is which and what they are for if you do not have a good and consistent naming convention. As well as making sure the routes are named in a logical manner it is also recommended that folders are used to separate routes into groups. The grouping method very much depends on the project but should be decided upon during the design stage.

Reusability and Scalability

The following are recommendations to improve reusability and scalability.

Divide and Conquer

Do not try and model all use cases as single and highly complex routes. Break a use case down into smaller bite size chunks that you can possibly use throughout several use cases.

While this might take a little more time to design it will reduce the build time and improve the scalability and reusability.

Remember, routes in the same virtual machine can pass messages via endpoints (vm or direct-vm). If the routes needing to communicate are not in the same virtual machine you can use message queues as a way of communicating.

Do Not Expect Anything

The more decoupled your routes become the less you can expect from the exchanges. In order to protect against unforeseen exceptions creeping into the system always check for the data that you are expecting before attempting to use it.

For example, if you are expecting a particular XML format in the body of a message then validate it against a XSD file to check it is the correct format before passing it further along the route. Use cTry components to catch potential problems like that.

Avoid camel dependencies

Where possible try to avoid using camel specific classes like org.apache.camel.exchange. Camel provides automatic type conversion, this helps to develop Java beans independent from camel packages.

Using annotations is the second best approach, if camel specific values are required. Working with the org.apache.camel.exchange object should be avoided as much as possible, because changes in the exchange object itself can easily lead into undesired behaviour.

Improve Extensibility

The following are recommendations to improve extensibility.

Multiple Entry Points

As the system matures you will find that entry points can change over time. Where in the beginning a flat comma separated file may have been the initial source of data, this can change format to be XML, JSON or even change type to become a message in a queue or an email.

To accommodate this it is a good idea to separate the entry point from the main flow of a route and decide upon a common format to share between routes after the data has entered the system.

For example, you might choose to create POJOs (Plain Old Java Objects) to share the data between routes and subroutes as they make the data contained very easy to access. Alternatively XML or JSON may be chosen. You might decide to pass POJOs between routes and XML for when you are passing data to Data Integration jobs. Whatever is decided it should be consistent throughout the system.

Flexible Persistence

If you need to persist data within your route, try to make this as flexible as possible. Use a subroute for persisting data, so you can easily change persistent storage from the file system to a database, or vice versa. Try also to keep all persistent dependent tasks/steps within this subroute, so that you do not need to worry about special dependencies, when changing the persistent storage.

Use Built in Libraries

Studio provides several 3rd party libraries which can be used in designing a route. These libraries also contain all further (internal) dependencies. Providing all indirect dependencies yourself can become a rather extensive task, so try to find functionality in the supplied libraries before looking elsewhere.

Miscellaneous

The following are general recommendations to make building and designing routes easier.

Use Context Variables

Use context variables whenever possible and reuse those variables whenever it is appropriate. If, for example, a message queue needs to be used then its location, port number and name should be set up in context variables so that they can be maintained in one place. If you are sharing any values between routes (Endpoint names, passwords, queues, URIs, etc) these should be stored in context variables.

Preserve exchange body while calling a bean

Sometimes a java bean needs to be called, but you do not want the method response to replace the current exchange body. But using the cBean component would cause this undesired behaviour.

Therefore you should not use the cBean component (in this case), but rather a cSetHeader component. The return value of the Java bean would not change the body, but would only set a header field which could be easily ignored, while the exchange body would remain untouched.

If you need to call a specific bean method name within the cSetHeader component you can just type the method name comma separated after the bean class name (e.g: beans.MyBean, myMethod).

Avoid instantiating a class for each message

Sometimes you will need to do some work on a message which cannot be done by generic components or by using a Talend DI Job. To do this you might choose Java and might want to encapsulate the code in a class or a bean. This is a very powerful mechanism and can be very useful, but can also cause performance issues if it is not implemented in an efficient manner.

If you have a route that might process thousands of messages an hour, having a new class instantiated for every message is not very efficient in terms of memory usage. Doing this inefficiently can increase the likelihood of facing the “Exception in thread "main" java.lang.OutOfMemoryError: Java heap space” exception. In order to avoid this it is a good idea to build your classes so that they do not need to be instantiated for every message. If possible instantiate them at the beginning of the route and store them in a registry.

Talend provide a component for doing this called the cBeanRegister component. This component will register the created object and make it available at any point in the route. If it is a simple bean you can reference it in the many ways described in the documentation using the inbuilt functionality. If you need to use it in a cProcessor component, you can retrieve it using a variation on following code:

//Get Camel Context
Map<String, CamelContext> contextMap = getCamelContextMap();
DefaultCamelContext dcc = (DefaultCamelContext)contextMap.get(jobName+"-ctx");   

//Get the object stored in the registry 
MyObject myObject = (MyObject)dcc.getRegistry().lookup("myObjectReference");

Doing this can dramatically improve the memory usage and performance of your routes.

Behaviour Analysis

If you want to test a component the first time, or if you encounter problems within your route that you cannot solve easily, you should create a new TestRoute and focus only on the specific component/problem. Try to make this example as simple as possible to avoid error from misconfiguration at another (unseen) point. This usually helps a lot to find a reason for an error, or learning how to handle a new component best.

Use cLog at all interesting points of your route, to make sure the content of your route is still what you expect it to be.

Use Component Specific Headers

Some components like cHTTP are aware of specific header fields (e.g. org.apache.camel.Exchange.HTTP_PATH). If such a header is set, this header will overwrite the default configuration of the component itself. This is quite helpful, if you can set a specific value at runtime only. If you use such a component specific header, set this header to your required value just before calling your component, and set this header to null right after that component.

If you don't do this, calling a similar component again later in the route (or within a subroute) could cause unexpected behaviour. To avoid the burden for each subroute to test whether or not any component specific header values are set, just remember to always reset these types of header values, right after they have been used.

Disappearing messages

If your exchange body is a stream you cannot read this stream twice (by default). So if you want to print the content of a stream to your logfile but also process this stream within a following component for example, you should use the cConvertBodyTo component to change the body type from stream to e.g. String. A String can read as often as you need to.

Always keep in mind the lifespan of the data that you are using and passing on.

Write Documentation

Each component has a Documentation tab. If you feel that an explanation as to what that component is doing would help someone, fill it in here.

There is also a Show Information tick box on this tab. If you select this it shows to those reading the route that there is information to read there for the component.

Documentation is always a chore so it makes sense to do it as you build. Filling in these Documentation tabs can really help others and is vastly less work than writing a complete document for each route.

If you do have to write a document for each route, these notes will help you when you come round to doing the documentation which there is seldom a great deal of time for when you are actually building routes. Having these notes will save a lot of investigation for you and others in the future.

Calculate somewhere else

There will be times when you will need to process data in messages. This can be done in code in cProcessor components but not everybody is comfortable with reading/writing code.

You will also find that sometimes the same processing may be needed across several routes. In order to make this processing reusable it is good practice to package this logic up in a Data Integration Job that can be shared amongst all areas that need that bit of logic. Data can be supplied in exchange messages as XML and Headers (for example) which DI can easily consume and output.

Learn Java or get used to reading it

Talend ESB is a code generating piece of software. Sometimes when you have a bug in your route it will be because of how the code is generated. Maybe the way the components have been connected is not handled very well or it doesn’t make sense for you to connect the components that way according the Camel Framework.

The best way to identify these issues and find workarounds is to be able to read the Java error stack and use it to point you toward the line of code that is at fault in the generated code (by using the Code tab). This will save hours of searching forums.

Synchronous or Asynchronous Endpoints

There will be plenty of times where you need to send data between endpoints. There are many way of doing this (direct, vm, direct-vm, seda) and you should work out which is the best for your route before implementing it.

Sometimes you may want the main body of the route to finish quickly but have some other processing in a subroute where you do not mind how long it takes. In this situation you should use a seda or vm endpoint as these are asynchronous.

However, if every subroute is required to have finished before the result of the main route is returned a direct or direct-vm endpoint should be used. This will make the route slower but will enable completeness.

Web Service Best Practices

This section describes the best practices that should be followed when building web services.

Selection of Service Type

The first decision that needs to be made when designing Web services is what type of service it needs to be (REST or SOAP). The following should be considered before making this decision.

Should the service be Stateless or Stateful

A stateless system can be seen as a black box where at any point in time the value of the outputs depend only on the value of the inputs.

A stateful system can be seen as a box where at any point in time the value of the outputs depend on the value of the inputs and of an internal state. So basically a stateful system is like a state machine with memory as the same set of inputs can generate different outputs depending on the previous inputs received by the system.

This is an important distinction to make when deciding on a type of service. If your service needs to be stateful then SOAP is the type of service you need. A real world example of where SOAP is preferred over REST can be seen in the banking industry where money is transferred from one account to another. SOAP would allow a bank to perform a transaction on an account and if the transaction failed, SOAP would automatically retry the transaction ensuring that the request was completed. Unfortunately, with REST, failed service calls must be handled manually by the requesting application.

What operations need to be performed

What does the service need to do? If it simply needs to carry out CRUD operations (Create, Read, Update or Delete) then REST is a good choice. It is lightweight, easy to construct the call (for the consumer), can make use of caching to reduce the load for regular calls and returns human readable responses. If your operations are more complex and need to stick to a strict contract, then SOAP is the better choice.

Must the Service Type be consistent

Is it architecturally important for the service type to be consistent across the system? This is an important decision to make as if it does then it is likely you will need to select SOAP unless all you are carrying out are simple CRUD or stateless operations.

However, if a mix of service types is permitted then that allows a lot of flexibility and can vastly reduce the effort in implementing the whole system. A choice that is often made in systems where a mixture of service types are permitted, is to use REST for simple read operations and to use SOAP for the complex operations and operations where data changes may occur.

Security

In the majority of cases REST and SOAP security systems are the same: some form of HTTP-based authentication plus Secure Sockets Layer (SSL).

However a SOAP service does support end-to-end message security. This means that if you pass SOAP messages from endpoint to endpoint to endpoint, over the same or different protocols, the message is secure. If your system needs this particular feature SOAP is definitely the way to go.

It should be noted that security is a large domain and far too complex to decide upon based on a couple of paragraphs. The point here is to say that while underlying REST and SOAP security systems are largely the same, SOAP has provision for intermediary security that REST does not.

Readability

Readability best practices for Web Services are practically the same as for the Mediation routes.

For more information, see the Readability section for Mediation routes.

Reusability and Scalability

The following are recommendations to improve reusability and scalability.

Divide and Conquer

Very similar to the section with the same title in the Mediation section, do not try and model all use cases as single and highly complex Web services.

The services should be broken down into their most atomic parts. There is no need to expose these atomic services to the outside world, but they can be used by other services to build up a more complex one which you will expose.

Do Not Expect Anything

The thing about services is that you have to expect to not be able to expect anything from the caller. It might be another system, an experienced developer, someone with a bit of knowledge or someone that has found it by mistake and wants to give it a try.

Obviously in systems with built in security you don’t need to worry so much about the person who finds it by mistake but it is important, no matter who or what is expected to use the service, that you always check for the data that you are receiving before attempting to use it.

Miscellaneous

There are many overlaps between best practices for Web Services and Mediation routes in general. Many of the miscellaneous best practices for Web Services have been covered in the Mediation routes Miscellaneous section.

For more information, see the Miscellaneous section for Mediation routes.

There are also a few other cases especially for Web Services which are described below.

Always return something

It is good practice to ensure that when a Web service is called that something is always returned. In many cases a return of data will be expected. But in some cases there may not actually be an expected return. No matter whether a response is needed, there should always be a response returned indicating a success or failure. There should also be a mechanism to ensure that errors are reported.

Use standard HTTP web codes

When returning statuses, ensure that where possible standard HTTP Web codes are used.

These can be seen here http://en.wikipedia.org/wiki/List_of_HTTP_status_codes.

Database connection pooling

Web services are highly available and therefore can cause problems for any databases they need to connect to if they are forced to open a new connection every time they need to interrogate them. A way around this is to use a connection pool.

At present Talend only supports connection pooling using a JDBC connection. Therefore it is considered best practice to make use of the JDBC database components when working with databases via services.

Mediation Route Standards

This section describes the standards that should be followed when building Mediation routes. Due to the nature of route development the standards are relatively light.

This section will also cover several related development standards such as naming conventions, variable usage and Java coding standards.

Context Variables

Context variables should be used in place of hardcoded parameters across the ESB system. Context groups should be set up in the development environment to ensure that common variables are reused by all developers. Different context groups should be set up to contain related variables.

For example, there should be a context group that will only contain variables directly related to the error and/or logging handling functionality. These variables should be used by all routes. But there should also be contexts set up for other groups that routes can fit into like project, business area, route types, etc. These should be decided upon as early as possible. Developers can set up context variables that are specific to individual routes, but this should only be done where absolutely necessary.

The naming convention for context variables should be “meaningful names in lower camel case”.

Naming Conventions

The naming conventions for the Routes should be as follows:

  • Route names start with ro_.
  • The rest of the name should be in upper camel case following a consistent project wide format.

An example of a route name for a Remittance project that validates an input XML schema might be ro_RemittanceValidateInputSchema.

All names should be approved by the lead developer/team and should be specified in the design to keep a consistent naming approach for the project.

Java Standards

It will be necessary to write some Java from time to time in order to meet certain requirements.

This section describes some standards that should be followed in order to ensure that code is a reusable, efficient and as readable as possible. Generally, the standards laid out in this document (http://www.oracle.com/technetwork/java/javase/documentation/codeconvtoc-136057.html) should be followed.

Below are a few key areas that must be followed.

Comments

The most important of all of the Java standards for readability are the comment standards. ALL code should be well documented. It must not be assumed that another developer will be able to interpret the code.

Some people are better at coding than others and as the Talend ESB tool is not a coding tool exclusively it is important to make sure that everyone has a chance of being able to work out what it happening.

All classes and methods should have Javadoc style comments (http://www.oracle.com/technetwork/java/javase/documentation/index-137868.html) and inline comments should be used where appropriate.

Code Format

Everybody has their own preference for code formatting. For this project a rule of thumb should be that it must be as readable as possible. A good way of ensuring this is to use tabs to format your code using one tab for each nested layer of code.

An example of a simple nested IF condition is shown below.
//A simple IF to set testValue to a new value according to its current value or that of testValue2

if(testValue > 0){

testValue = testValue - 100;

else if(testValue < 0){

              testValue = testValue + 100;

}else{

              If(testValue2 > 0){

                            testValue = testValue2 – 100;

              }else if(testValue2 < 0){

                            testValue = testValue2 + 100;

}else{

                            testValue = 9999;

}

}

Code Reuse

Reusable pieces of code should be packaged as beans so that they can be reused. It is good practice to create static methods where possible so that a new instance of the class does not need to be instantiated for each method. If this cannot be done then a workaround is to use the cBeanRegister component which is mentioned in the best practices section of this document.

However it is achieved, it is important to make sure that code is reused wherever possible.

Web Service Standards

This section describes the standards that should be followed when building the Web Services. Web Services align quite closely to Data Integration jobs, so there will be some commonality between the standards for Web Services and Data Integration.

Context Variables

The standards for context variables are the same as for the Mediation routes.

For more information, see the Context Variables section for Mediation routes.

Naming Conventions

The naming conventions for the Web Services are as follows:

  • Route names start with ws_.
  • The rest of the name should be in upper camel case following a consistent project wide format.

An example of a service name for a Remittance project that retrieves a balance might be ws_RemittanceRetrieveBalance.

All names should be approved by the lead developer/team and should be specified in the design to keep a consistent naming approach for the project.

Remember that the Web Service will be a Talend Job and its name does not necessarily have to have anything in common with it endpoint or URI. Standards for these are in the next section.

Web Service Endpoints and URIs

A Web Service is exposed to its consumers via an Endpoint and a URI. It is important that each service has a different endpoint or it will overwrite the current service running using that endpoint when the new service is started. It is possible to share endpoints if multiple services are implemented in the same job. If that is done then the services need to be distinguished by different URIs.

This is a nice method of grouping services by endpoint however it can lead to very big and messy jobs.

Java Standards

The Java standards are the same as for the Mediation routes.