Big Data - Cloud - 8.0

Talend Release Notes

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Cloud API Services Platform
Talend Cloud Big Data
Talend Cloud Big Data Platform
Talend Cloud Data Fabric
Talend Cloud Data Integration
Talend Cloud Data Management Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Cloud API Designer
Talend Cloud API Tester
Talend Cloud Data Inventory
Talend Cloud Data Preparation
Talend Cloud Data Stewardship
Talend Cloud Management Console
Talend Cloud Pipeline Designer
Talend Data Preparation
Talend Data Stewardship
Talend Studio
Content
Installation and Upgrade
Release Notes

Feature

Description

Available in

Support for Amazon EMR 6.6.0 and 6.7.0 with Spark Universal 3.2.x

You can now run your Spark Jobs on an Amazon EMR cluster using Spark Universal with Spark 3.2.x in Yarn cluster mode. You can configure it either in the Spark Configuration view of your Spark Jobs or in the Hadoop Cluster Connection metadata wizard.

When you select this mode, Talend Studio is compatible with Amazon EMR 6.6.0 and 6.7.0 versions.

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All subscription-based Talend products with Big Data

Support for Databricks runtime 11.x with Spark Universal 3.3.x

You can now run your Spark Batch and Streaming Jobs on transient and interactive Databricks clusters on Google Cloud Platform (GCP), AWS, and Azure using Spark Universal with Spark 3.3.x. You can configure it either in the Spark Configuration view of your Spark Jobs or in the Hadoop Cluster Connection metadata wizard.

When you select this mode, Talend Studio is compatible with Databricks 11.x version.

With the general availability of this feature, the following previous known issues are now fixed:
  • tGSConfiguration works in Spark Streaming Jobs
  • tS3Configuration works as a storage component for tAvroInput when using AWS
  • tAzureFSConfiguration works as a storage component for tAvroInput when using Azure
  • tFileInputDelimited, tFileInputJSON, tFileInputParquet, tFileInputPositional, tFileInputRegex, and tFileInputXML do not work with tGSConfiguration when using GCP

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All subscription-based Talend products with Big Data

Support for BigDecimal in tRedshiftOutput

You can now use BigDecimal value in the schema of the tRedshiftOutput component in your Spark Batch Jobs.

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All subscription-based Talend products with Big Data

Support for tGSConfiguration with Spark Universal

You can now use the tGSConfiguration component to provide access to Google Storage with other input and output components. This feature applies to both Spark Batch and Spark Streaming Jobs.

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All subscription-based Talend products with Big Data

Support for schema registry

You can now use schema registry in Spark Streaming Jobs with the following components:
  • tKafkaConfiguration
  • tKafkaInputAvro

Schema registry allows Talend Studio to register information about Avro records.

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All subscription-based Talend products with Big Data

Support for S3 Select

You can now use S3 Select with tFileInputDelimited and tFileInputJSON when using tS3Configuration as a storage component in your Spark Jobs running with Spark Universal in either YARN cluster (with an Amazon EMR cluster) or Databricks modes. S3 Select allows you to reduce the amount of data retrieved from S3 using Spark SQL queries.

When you run your Spark Jobs on Databricks, the S3 bucket must be in the same region as the cluster, otherwise you will get an S3 exception on the cluster side.

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All subscription-based Talend products with Big Data