Big Data: new features - 8.0

Talend Big Data products Release Notes

Version
8.0
Language
English (United States)
EnrichDitaval
Big Data
Product
Talend Big Data
Talend Big Data Platform
Talend Open Studio for Big Data
Talend Real-Time Big Data Platform
Content
Installation and Upgrade
Release Notes

Feature

Description

Product

Support of Spark Universal You can now run your Spark Jobs using Spark Universal with Spark 2.4.x or Spark 3.0.x, either in Local or Yarn cluster mode.

Spark Universal is a mechanism that allows Talend Studio to be compatible with every big data distribution available for a given Spark version, using only a Hadoop configuration JAR file that contains all the necessary information to establish a connection to the cluster in Yarn cluster.

Spark Universal gives you more agility by enabling a switch between the different Spark modes, distributions or environments.

You can configure your Spark Universal connection either in the Spark configuration view of your Job or in the Hadoop Cluster Connection metadata wizard from the Repository tree view:

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

Support of Kubernetes with Spark Universal 3.1.x You can now run your Spark Jobs using Spark Universal with Spark 3.1.x in Kubernetes mode.
You can configure your Spark Universal connection with Kubernetes either in the Spark configuration view of your Job or in the Hadoop Cluster Connection metadata wizard from the Repository tree view:

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

Support of Dynamic Schema in Spark Batch components You can now use the Dynamic Schema in your Spark Jobs with the following components:
  • tDeltaLakeInput
  • tDeltaLakeOutput
  • tFileInputParquet
  • tFileOutputParquet
  • tJDBCInput
  • tJDBCOutput
  • tLogRow
  • tSqlRow

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

Support of new distributions

Delivered in 7.3 monthly releases

You can use the following distributions for your Spark Jobs:
  • Microsoft HD Insight 4.0 with Spark 2.4 (delivered in 7.3 R2020-06 monthly release)
  • CDP Private Cloud Base 7.1 with Spark 2.4 (delivered in 7.3 R2020-06 monthly release)
  • Databricks 7.3 LTS with Spark 3.0 (delivered in 7.3 R2021-02 monthly release)
  • CDP Public Cloud Data Hub (delivered in 7.3 R2021-03 monthly release)
  • AWS EMR 6.2 with Spark 3.0 (delivered in 7.3 R2021-05 monthly release)
  • Azure Synapse with Spark 3.0 (delivered in 7.3 R2021-08 monthly release)

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

Support of Spark 3.0 in local mode for Spark Jobs

Delivered in 7.3 R2021-02 monthly release

Talend now supports Spark 3.0 in local mode when running Spark Jobs in Talend Studio.
Note: The following elements do not support Spark 3.0 in local mode:
  • ADLS Gen2
  • tCassandraInput and tCassandraOutput
  • tElasticSearchInput and tElasticSearchOutput

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

Support of Knox for CDP Public Cloud Data Hub on AWS

Delivered in 7.3 R2021-06 monthly release

When you use a CDP Public Cloud Data Hub instance on AWS with CDP 7.1 and onwards in YARN cluster and HDFS modes, you can now authenticate using Knox either in the Spark configuration view of your Spark Jobs or in the Hadoop Cluster Connection metadata wizard from the Repository tree view. Knox allows you to provide a single point of authentication only using SSO.

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

Support of Hive Warehouse Connector with Cloudera CDP 7.1.x

Delivered in 7.3 R2021-10 monthly release

You can now use the Hive Warehouse Connector to get data from and write data to Hive transactional managed tables in Spark Batch Jobs with the following new components:

  • tHiveWarehouseConfiguration: enables the reuse of the Hive Warehouse Connector connection configuration to Hive in the same Job.
  • tHiveWarehouseInput: extracts data from Hive and sends the data to the component that follows using Hive Warehouse Connector.
  • tHiveWarehouseOutput: connects to a given Hive database and writes the received data into a given Hive table or a directory in HDFS using Hive Warehouse Connector.

With Hive Warehouse Connector, Talend Studio supports Hive transactional managed tables which allows you to have a more optimal transaction control over your data.

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform