Big Data: new features - Cloud - 7.3

Talend Release Notes

Version
Cloud
7.3
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Cloud API Services Platform
Talend Cloud Big Data
Talend Cloud Big Data Platform
Talend Cloud Data Fabric
Talend Cloud Data Integration
Talend Cloud Data Management Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Cloud API Designer
Talend Cloud API Tester
Talend Cloud Data Inventory
Talend Cloud Data Preparation
Talend Cloud Data Stewardship
Talend Cloud Pipeline Designer
Talend Data Preparation
Talend Data Stewardship
Talend Management Console
Talend Studio
Content
Installation and Upgrade
Release Notes
Last publication date
2024-03-20

Spark Job designer enhancements

Feature

Description

Available in

ADLS Gen2 Azure Data Lake Storage Generation2 is now supported with the following Big Data platforms:
  • Databricks V5.5 LTS
  • Cloudera CDH V6.1
  • Hortonworks Data Platform V3.1

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All Talend products with Big Data

Snowflake The Snowflake components for Spark Batch are now generally available.

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All Talend products with Big Data

Native Datasets
In Spark Batch Jobs, support for native Spark Datasets has been added to more components to obtain inherent performance gains. To benefit from this enhancement, users must be using Spark V2.0 onwards with the following components:
  • tFileInputParquet and tFileOutputParquet
  • tFileInputDelimited and tFileOutputDelimited
  • tFileInputFullRow
  • tFileInputPositional and tFileInputRegex
  • tSortRow, tExtractDelimitedFields, tExtractPositionalFields, tExtractRegexFields, tExtractXMLField, tExtractJSONFields, tNormalize, tReplace, tReplicate, tSample, tUnite and tSchemaComplianceCheck.
The following components require Spark V2.1 onwards to support Spark Datasets.
  • tAggregateRow
  • Left Outer Join in tMap, in addition to the tMap features that have had support for Datasets since Talend Studio V7.2.

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All Talend products with Big Data

Delta Lake The tDeltaLakeInput and tDeltaLakeOutput components are now generally available.

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All Talend products with Big Data

Apache Spark V2.4 This new Aparch Spark version is supported with more Big Data platforms in Spark Batch and Spark Streaming Jobs. The platforms which now support Spark V2.4 are:
  • Cloudera CDH6.1.1
  • Databricks V5.5
  • Google Cloud Dataproc V1.4

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All Talend products with Big Data

Job status With Databricks, users are enabled to configure how often the Studio asks a Spark cluster for Job status.

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All Talend products with Big Data

tS3Configuration With Amazon EMR, users can now apply an S3 bucket policy.

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All Talend products with Big Data

tAggregateRow In Spark Batch Jobs, the Count (distinct) function and the Sample Standard Deviation Algorithm function have been added.

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All Talend products with Big Data

New driver versions
The support for the following driver versions has been added to their related components:
  • Redshift JDBC driver V1.23.7.106
  • MySQL driver V8.0.18
  • Teradata JDBC driver V16.20.00.13
  • MariaDB JDBC driver V2.5.3 in JDBC components
  • Snowflake JDBC driver V3.11.x

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All Talend products with Big Data

New components available

Two new components are now available: tAzureAdlsGen2Input and tAzureAdlsGen2Output.

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All Talend products with Big Data

Support for Big Data platforms

Feature

Description

Available in

Databricks
  • Databricks V5.5 LTS is now supported by Spark Jobs.
  • Support for transient clusters of Azure Databricks has been added.

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All Talend products with Big Data

Hortonworks Data Platform
  • Hortonworks Data Platform V3.1 is supported.
  • The Hortonworks Data Platform V3.x series is now generally available among the Dynamic Distributions.

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All Talend products with Big Data

Google Cloud Dataproc

  • Google Cloud Dataproc V1.4 is supported
  • In Standard Jobs, tGoogleDataprocManage supports all regions.

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All Talend products with Big Data

Custom Hadoop configuration When defining connections to Cloudera or Hortonworks in Repository, users can now specify a custom JAR file to provide the connection parameters of the Hadoop environment to be used.

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All Talend products with Big Data

Other components

Feature

Description

Available in

Kafka Kafka V2.2.1 is now officially supported with:
  • Cloudera CDH V6.1
  • Hortonworks Data Platform V3.1
  • Kafka components in Standard Jobs

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All Talend products with Big Data

Google BigQuery
  • In tBigQueryBulkExec, users can now drop tables with either a service account or their OAuth 2.0 credentials.
  • The BigQuery components now support Google cloud client API 1.25.10.

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All Talend products with Big Data

Couchbase
  • tCouchbaseOutput now allows users to perform N1QL queries with parameters.
  • Non-JSON documents are supported.

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All Talend products with Big Data

CXF

CXF V3.3.4 is now supported in the following components:

  • tDBFSConnection, tDBFSGet, tDBFSPut
  • tHCatalogInput, tHCatalogLoad, tHCatalogOperation, tHCatalogOutput

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All Talend products with Big Data

MongoDB

The support for MongoDB V4.2.x has been added to the MongoDB components in Standard Jobs.

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All Talend products with Big Data