Architecture Styles and Design Patterns - Cloud

Talend Cloud Physical Reference Architecture

Version
Cloud
Language
English
Product
Talend Cloud
Module
Talend API Designer
Talend API Tester
Talend Data Inventory
Talend Data Preparation
Talend Data Stewardship
Talend Management Console
Talend Pipeline Designer
Content
Installation and Upgrade
Reference Architecture
Last publication date
2024-03-28

What is an Architecture Style?

An Architecture Style is a coarse-grain pattern that provides an abstract framework for a family of systems.

There are four main architectural styles of Data Processing: Batch, Real-time, Event-driven, and Streaming.

Batch

Batch processing is a method of running high-volume, repetitive data jobs during a specified window. For data processing, tools that have the capability to perform this style of processing are commonly known as Data Integration tools, ETL tools (Extract, Transform, Load) or ELT tools (Extract, Load, Transform, or SQL "push-down"). However, modern tools such as Talend Data Fabric go far beyond these basic capabilities by adding Data Governance capabilities, as well as the ability to implement all of the architectural styles, not just batch.

Batch processing with such tools has the following characteristics:
  • Latency tolerant
  • Complex transformations
  • Massive volumes
  • Code-less specifications
  • Metadata reuse

One of the classic design patterns that Batch Data Integration is commonly used to implement is a Data Warehouse for business analytics and reporting:

Data Warehouse diagram.

Real Time

Wikipedia describes Real-time computing as:

Real-time computing (RTC), or reactive computing is the computer science term for hardware and software systems subject to a "real-time constraint", for example from event to system response. Real-time programs must guarantee response within specified time constraints, often referred to as "deadlines".

Real-time processing is a near instantaneous response to an action or event. Most mission critical applications are real-time.

For Data Processing, Real-time usually refers to the implementation of REST or SOAP web services using the integration tooling. These services are therefore data orientated. Talend Cloud Data Fabric provides extensive capabilities for the implementation of Real-time data processing architectures including:
  • API Services
  • Creation of Data Services - SOAP or REST services implemented in Talend Studio using the same palette of components used to create Batch Jobs.
  • Routes - Graphically design Camel Routes in Talend Studio to implement SOAP or REST services
  • Deployment to Talend Runtime, as a Microservice or as a Microservice within a Container
  • Logging and Monitoring
  • Continuous Integration \ Deployment

Streaming

Wikipedia describes Streaming as follows:

Streaming data is data that is continuously generated by different sources. Such data should be processed incrementally using Stream Processing techniques without having access to all of the data. In addition, it should be considered that concept drift may happen in the data which means that the properties of the stream may change over time. It is usually used in the context of big data in which it is generated by many different sources at high speed

For more information on Streaming and its use cases, see What is Streaming data?.

Streaming has the following characteristics
  • Low Latency​
  • Simple transformations​ and Aggregations
  • Small batches​ often known as 'micro-batching'
  • Fault Tolerant​
  • Minimum risk of data loss​
  • Sliding Window Capability

    The below diagram shows how data streams are processed by a Spark Engine in the form of micro batches.

    Diagram showing how data streams are processed by a Spark Engine.

Event-driven

Wikipedia defines an Event-driven architecture as follows:

Event-driven architecture (EDA) is a software architecture paradigm promoting the production, detection, consumption of, and reaction to events.

An event can be defined as "a significant change in state". For example, when a consumer purchases a car, the car's state changes from "for sale" to "sold". A car dealer's system architecture may treat this state change as an event whose occurrence can be made known to other applications within the architecture. From a formal perspective, what is produced, published, propagated, detected or consumed is a (typically asynchronous) message called the event notification, and not the event itself, which is the state change that triggered the message emission. Events do not travel, they just occur. However, the term event is often used metonymically to denote the notification message itself, which may lead to some confusion. This is due to Event-Driven architectures often being designed atop message-driven architectures, where such communication pattern requires one of the inputs to be text-only, the message, to differentiate how each communication should be handled.

As described, this style (especially for data processing) is most commonly associated with the usage of message-driven architectures. However, other examples that could be implemented with Talend include a Route polling for a file on an FTP server and processing those files when the upload to the FTP server is complete and using a web service to instantiate an asynchronous process - that is, the web service does not wait for the process to complete before responding to its client.

Typical characteristics of an Event-driven architecture include:
  • Message based​
  • Ensured delivery​
  • Restart/recovery​
  • Transaction oriented

    The below diagram shows an enterprise bus where messages are published to topics and read by the subscribers.

    Diagram showing how messages are published to topics and then read.

What is a Design Pattern?

A Design Pattern is a general reusable solution to a commonly occurring problem within a given context.

Characteristics:​
  • Represent field-tested solution to common design problems​
  • Are generally repeatable by most IT professionals involved with design​
  • Can be used to ensure consistency in how systems are designed and built​
  • Can become the basis for design standards for our Job Designs, Routes, etc.
Common design patterns with Architecture Styles
Architecture Style Design Pattern Examples
Batch
  • ETL Load
  • ELT Load
  • CDC - (Change Data Capture)
  • File Transfer
Loading a Data warehouse at defined frequency, Daily Incremental Data Load, FTP Transfers, Data Replication etc.
Real Time
  • Web Services
  • Message Exchange patterns
  • Micro Services
Salesforce updates, Reading Queues, Enterprise Service Bus message reads, API services for Integration
Streaming
  • Top N (Trending)
  • Stream Joins
  • External Lookup
  • Sliding/Rolling windows
  • Responsive Shuffling
  • Out-Of-Sequence effects
Leader boards, Tweeter streams, Live Streams
Event Driven
  • Publish/Subscribe
  • Asynchronous Push
  • Receiver Flow Control
Sensor data on event occurrence, Workflow events, File Triggers