Spark Job designer enhancements
Feature |
Description |
Available in |
---|---|---|
ADLS Gen2 | Azure Data Lake Storage Generation2 is now supported
with the following Big Data platforms:
|
ⓘ Available in: Big Data Big Data Platform Cloud Big Data Cloud Big Data Platform Cloud Data Fabric Data Fabric Real-Time Big Data Platform All subscription-based Talend products with Big Data |
Snowflake | The Snowflake components for Spark Batch are now generally available. |
ⓘ Available in: Big Data Big Data Platform Cloud Big Data Cloud Big Data Platform Cloud Data Fabric Data Fabric Real-Time Big Data Platform All subscription-based Talend products with Big Data |
Native Datasets |
In Spark Batch Jobs, support for native Spark Datasets has been added to more components to
obtain inherent performance gains. To benefit from this enhancement,
users must be using Spark V2.0 onwards with the following components:
The following components require Spark V2.1 onwards to support Spark
Datasets.
|
ⓘ Available in: Big Data Big Data Platform Cloud Big Data Cloud Big Data Platform Cloud Data Fabric Data Fabric Real-Time Big Data Platform All subscription-based Talend products with Big Data |
Delta Lake | The tDeltaLakeInput and tDeltaLakeOutput components are now generally available. |
ⓘ Available in: Big Data Big Data Platform Cloud Big Data Cloud Big Data Platform Cloud Data Fabric Data Fabric Real-Time Big Data Platform All subscription-based Talend products with Big Data |
Apache Spark V2.4 | This new Aparch Spark version is supported with more
Big Data platforms in Spark Batch and Spark Streaming Jobs. The platforms
which now support Spark V2.4 are:
|
ⓘ Available in: Big Data Big Data Platform Cloud Big Data Cloud Big Data Platform Cloud Data Fabric Data Fabric Real-Time Big Data Platform All subscription-based Talend products with Big Data |
Job status | With Databricks, users are enabled to configure how often the Studio asks a Spark cluster for Job status. |
ⓘ Available in: Big Data Big Data Platform Cloud Big Data Cloud Big Data Platform Cloud Data Fabric Data Fabric Real-Time Big Data Platform All subscription-based Talend products with Big Data |
tS3Configuration | With Amazon EMR, users can now apply an S3 bucket policy. |
ⓘ Available in: Big Data Big Data Platform Cloud Big Data Cloud Big Data Platform Cloud Data Fabric Data Fabric Real-Time Big Data Platform All subscription-based Talend products with Big Data |
tAggregateRow | In Spark Batch Jobs, the Count (distinct) function and the Sample Standard Deviation Algorithm function have been added. |
ⓘ Available in: Big Data Big Data Platform Cloud Big Data Cloud Big Data Platform Cloud Data Fabric Data Fabric Real-Time Big Data Platform All subscription-based Talend products with Big Data |
New driver versions |
The support for the following driver versions has been
added to their related components:
|
ⓘ Available in: Big Data Big Data Platform Cloud Big Data Cloud Big Data Platform Cloud Data Fabric Data Fabric Real-Time Big Data Platform All subscription-based Talend products with Big Data |
New components available |
Two new components are now available: tAzureAdlsGen2Input and tAzureAdlsGen2Output. |
ⓘ Available in: Big Data Big Data Platform Cloud Big Data Cloud Big Data Platform Cloud Data Fabric Data Fabric Real-Time Big Data Platform All subscription-based Talend products with Big Data |
Support for Big Data platforms
Feature |
Description |
Available in |
---|---|---|
Databricks |
|
ⓘ Available in: Big Data Big Data Platform Cloud Big Data Cloud Big Data Platform Cloud Data Fabric Data Fabric Real-Time Big Data Platform All subscription-based Talend products with Big Data |
Hortonworks Data Platform |
|
ⓘ Available in: Big Data Big Data Platform Cloud Big Data Cloud Big Data Platform Cloud Data Fabric Data Fabric Real-Time Big Data Platform All subscription-based Talend products with Big Data |
Google Cloud Dataproc |
|
ⓘ Available in: Big Data Big Data Platform Cloud Big Data Cloud Big Data Platform Cloud Data Fabric Data Fabric Real-Time Big Data Platform All subscription-based Talend products with Big Data |
Custom Hadoop configuration | When defining connections to Cloudera or Hortonworks in Repository, users can now specify a custom JAR file to provide the connection parameters of the Hadoop environment to be used. |
ⓘ Available in: Big Data Big Data Platform Cloud Big Data Cloud Big Data Platform Cloud Data Fabric Data Fabric Real-Time Big Data Platform All subscription-based Talend products with Big Data |
Other components
Feature |
Description |
Available in |
---|---|---|
Kafka | Kafka V2.2.1 is now officially supported with:
|
ⓘ Available in: Big Data Big Data Platform Cloud Big Data Cloud Big Data Platform Cloud Data Fabric Data Fabric Real-Time Big Data Platform All subscription-based Talend products with Big Data |
Google BigQuery |
|
ⓘ Available in: Big Data Big Data Platform Cloud Big Data Cloud Big Data Platform Cloud Data Fabric Data Fabric Real-Time Big Data Platform All subscription-based Talend products with Big Data |
Couchbase |
|
ⓘ Available in: Big Data Big Data Platform Cloud Big Data Cloud Big Data Platform Cloud Data Fabric Data Fabric Real-Time Big Data Platform All subscription-based Talend products with Big Data |
CXF |
CXF V3.3.4 is now supported in the following components:
|
ⓘ Available in: Big Data Big Data Platform Cloud Big Data Cloud Big Data Platform Cloud Data Fabric Data Fabric Real-Time Big Data Platform All subscription-based Talend products with Big Data |
MongoDB |
The support for MongoDB V4.2.x has been added to the MongoDB components in Standard Jobs. |
ⓘ Available in: Big Data Big Data Platform Cloud Big Data Cloud Big Data Platform Cloud Data Fabric Data Fabric Real-Time Big Data Platform All subscription-based Talend products with Big Data |