Big Data: new features - 7.1

Talend Data Fabric Release Notes

EnrichVersion
7.1
EnrichProdName
Talend Data Fabric
task
Installation and Upgrade

Spark Job designer enhancements

Feature

Description

Spark version Spark 2.3 is supported not only in the Local mode but also with EMR 5.15 (and CDH6.0 and HDP 3.0, as technical previews), taking advantage of the innovations and improved stability in the latest version of Apache Spark.
Kerberos security

Talend now supports Kerberos on EMR with the addition of EMR 5.15

tAzureFSConfiguration enhancements

Support for Azure Data Lake Storage and Azure Blob Storage in this component is available with Databricks.

Spark Codegen enhancements These enhancements prepare the Talend Jobs for Apache Spark to use Spark Datasets.
Schema compliance tSchemaComplianceCheck has been created.
Timestamp granularity

Users can output dates, hours, minutes and seconds contained in their Date-type data.

Support for Big Data platforms

Feature

Description

Cloud Big Data platforms

Support for the following platforms has been added:

  • Databricks:
    • Azure Databricks and Databricks on AWS in Spark Jobs.
    • DBFS components have been created.
    • Spark Jobs support Databricks.
  • Qubole:
    • Support for this platform has been added to Hive and the Pig components.
    • Support for this distribution has been added to Hive components, Pig components and Spark Jobs.

Together, all of the above changes help bring return-on-investment with Serverless Big Data and reduce processing costs by using Spark as a service in the Cloud. They enable transient usage for data management, bring more flexibility with elastic processing, and enable pay-per-use for Spark computing.

Upgraded support for Hadoop distributions
  • Hortonworks Data Platform V2.6.0.3-8
  • EMR 5.15
  • MapR 6.0.1 with MEP 5.0
Dynamic Hadoop distributions

The ability to use a Cloudera or Hortonworks version that was not released at the moment your Talend Studio was released, by simply adding this version yourself through several clicks, brings unprecedented agility and flexibility.

Dynamic distributions for HDP 3.x and CDH 6.x are in technical preview in this release.

Other components

Feature

Description

Kafka components

The Kafka components support Kafka V1.1.0 in Standard Jobs.

Sqoop and Hcatalog tSqoopExport can now read schema from Hcatalog.
Hive metastore Users are enabled to set up an HA (High Availability) Hive metastore using the Hive connection metadata wizard or the tHiveConfiguration component in a Spark Job.
HDFS The explicit support for the WebHDFS scheme and the ADLS scheme has been added to the HDFS components.
Google BigQuery

The Google service account mode is supported to authenticate to Google BigQuery.

MapR OJAI

The tMapROjaiInput component is created.

MarkLogic

Marklogic V9.0.5 is supported.

Continuous Deployment

Feature

Description

Continuous Deployment: Docker support

You are now able to configure your Continuous Integration server to deploy artifacts of your Talend project to a Docker registry.