Skip to main content Skip to complementary content

Stability and performance enhancements in Talend Cloud infrastructure

Issues and impacts

Some users reported having encountered intermittent Bad Gateways (502) and Gateway Timeout (504) errors when using Talend Cloud API, leading to manual restarts of failed tasks.

Few endpoints are reported having this issue, affecting less than 0.01% of their requests.

Related ticket: SRESEC-3188

Root causes

  • At the end of 2023, some endpoints were migrated to a new API gateway solution.
  • At the beginning of 2024, a tool was implemented to automatically launch and resize the compute cluster for cost efficiency and resource usage. This resulted in more frequent evictions and restarts of the API gateway services.

The issue logging started between January 29, 2024, and February 5, 2024.

Resolutions

Changes released in R2024-03 is part of the continuous effort to improve the stability and performance of Talend Cloud infrastructure. These changes are mainly:
  • Horizontal Service Autoscalers have been implemented to scale out, meaning they automatically add more instances to the system to handle increased traffic as needed.
  • Graceful shutdown and PreStop hooks have been added to safely finish any ongoing tasks and wrap up customer sessions before shutting down services.
  • Plans for managing service interruptions (known as Disruption Budgets) and updates (such as a rolling-update deployment strategy) have been reviewed to ensure that a minimum number of services are kept running during deployment or node eviction.
  • Kubernetes' Pod-antiaffinity and taints features have been implemented to help prevent multiple services from the same API gateway from being scheduled on the same node.
  • The Kubernetes topology spread constraints have been configured to ensure that API gateway services are evenly distributed across different availability zones.
  • The idle timeout settings between the Web Application Firewall (WAF) and the API gateway have been aligned and optimized.

The infrastructure has been updated with these changes transparently. No action is necessary on your side.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!