Big Data: new features - 7.3

Talend Big Data products Release Notes

author
Talend Documentation Team
EnrichVersion
7.3
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Open Studio for Big Data
Talend Real-Time Big Data Platform
task
Installation and Upgrade
Release Notes

Spark Job designer enhancements

Feature

Description

Product

ADLS Gen2 Azure Data Lake Storage Generation2 is now supported with the following Big Data platforms:
  • Databricks V5.5 LTS
  • Cloudera CDH V6.1
  • Hortonworks Data Platform V3.1

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

Snowflake The Snowflake components for Spark Batch are officially supported. They are not in technical preview status anymore.

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

Native Datasets
In Spark Batch Jobs, support for native Spark Datasets has been added to more components to obtain inherent performance gains. To benefit from this enhancement, users must be using Spark V2.0 onwards with the following components:
  • tFileInputParquet and tFileOutputParquet
  • tFileInputDelimited and tFileOutputDelimited
  • tFileInputFullRow
  • tFileInputPositional and tFileInputRegex
  • tSortRow, tExtractDelimitedFields, tExtractPositionalFields, tExtractRegexFields, tExtractXMLField, tExtractJSONFields, tNormalize, tReplace, tReplicate, tSample, tUnite and tSchemaComplianceCheck.
The following components require Spark V2.1 onwards to support Spark Datasets.
  • tAggregateRow
  • Left Outer Join in tMap, in addition to the tMap features that have had support for Datasets since Talend Studio V7.2.

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

Delta Lake The tDeltaLakeInput and tDeltaLakeOutput components are not in technical preview anymore.

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

Apache Spark V2.4 This new Aparch Spark version is supported with more Big Data platforms in Spark Batch and Spark Streaming Jobs. The platforms which now support Spark V2.4 are:
  • Cloudera CDH6.1.1
  • Databricks V5.5
  • Google Cloud Dataproc V1.4

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

Job status With Databricks, users are enabled to configure how often the Studio asks a Spark cluster for Job status.

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

tS3Configuration With Amazon EMR, users can now apply an S3 bucket policy.

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

tAggregateRow In Spark Batch Jobs, the Count (distinct) function and the Sample Standard Deviation Algorithm function have been added.

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

New driver versions
The support for the following driver versions has been added to their related components:
  • Redshift JDBC driver V1.23.7.106
  • MySQL driver V8.0.18
  • Teradata JDBC driver V16.20.00.13
  • MariaDB JDBC driver V2.5.3 in JDBC components
  • Snowflake JDBC driver V3.11.x

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

New components available

Two new components are now available: tAzureAdlsGen2Input and tAzureAdlsGen2Output.

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

Support for Big Data platforms

Feature

Description

Product

Databricks
  • Databricks V5.5 LTS is now supported by Spark Jobs.
  • Support for transient clusters of Azure Databricks has been added.

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

Hortonworks Data Platform
  • Hortonworks Data Platform V3.1 is supported.
  • The Hortonworks Data Platform V3.x series is now officially available among the Dynamic Distributions. They are not on technical preview anymore.

Talend Open Studio for Big Data

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

Google Cloud Dataproc

  • Google Cloud Dataproc V1.4 is supported
  • In Standard Jobs, tGoogleDataprocManage supports all regions.

Talend Open Studio for Big Data

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

Custom Hadoop configuration When defining connections to Cloudera or Hortonworks in Repository, users can now specify a custom JAR file to provide the connection parameters of the Hadoop environment to be used.

Talend Open Studio for Big Data

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

Other components

Feature

Description

Product

Kafka Kafka V2.2.1 is now officially supported with:
  • Cloudera CDH V6.1
  • Hortonworks Data Platform V3.1
  • Kafka components in Standard Jobs

Talend Open Studio for Big Data

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

Google BigQuery
  • In tBigQueryBulkExec, users can now drop tables with either a service account or their OAuth 2.0 credentials.
  • The BigQuery components now support Google cloud client API 1.25.10.

Talend Open Studio for Big Data

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

Couchbase
  • tCouchbaseOutput now allows users to perform N1QL queries with parameters.
  • Non-JSON documents are supported.

Talend Open Studio for Big Data

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

CXF

CXF V3.3.4 is now supported in the following components:

  • tDBFSConnection, tDBFSGet, tDBFSPut
  • tHCatalogInput, tHCatalogLoad, tHCatalogOperation, tHCatalogOutput

Talend Open Studio for Big Data

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

MongoDB

The support for MongoDB V4.2.x has been added to the MongoDB components in Standard Jobs.

Talend Open Studio for Big Data

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform