Big Data: new features - 7.0

Talend Data Fabric Release Notes

author
Talend Documentation Team
EnrichVersion
7.0
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Open Studio for Big Data
Talend Real-Time Big Data Platform
task
Installation and Upgrade

Enhancements of Spark Job designer

Feature

Description

Yarn cluster

The Yarn cluster mode of Spark is now supported with all the following distributions:
  • Amazon EMR

  • Cloudera

  • Hortonworks

  • MapR

  • Microsoft HD Insight

  • Cloudera Altus

Spark version

Spark 2.2 is now supported with Cloudera CDH 5.12 and Cloudera CDH 5.13.

Kudu components in Spark Batch Jobs

Read data from and write data in partitions in a Cloudera Kudu table.

These components support Kerberos globally enabled at the Cloudera cluster level but does not support Kerberos enabled at the Kudu level.

tAzureFSConfiguration enhancements

The support of Azure Data Lake Store in this component is available with Hortonworks Data Platform, Cloudera CDH, Cloudera Altus and Microsoft HD Insight.

tS3Configuration

  • If using the S3A filesytem, users can use this component to make their Jobs assume a role and the permissions associated with this role.

  • Users can also use the STS endpoints enabled for given regions by the AWS administrator of their cluster.

  • These new features are available when you are using this component with CDH 5.10 and onwards and HDP 2.5 and onwards.

tHBaseOutput

Users can now use custom row keys with this component.

Machine learning components

With the ability to activate Spark checkpointing and configure the checkpointing interval in machine learning components, you can now break up long Resilient Distributed Dataset lineage and save the intermediate RDDs to a checkpointing directory at the configured interval. Checkpoints are useful when the lineage graphs are long and help avoid StackOverflow errors in high-iteration jobs.

Enhancements of Hadoop support

Feature

Description

Cloudera Altus

The support for Cloudera Altus on Azure has been added as technical preview.

Upgraded support for Hadoop distributions

  • Cloudera CDH V5.13

  • EMR 5.8

Dynamic Hadoop distributions (technical preview)

In the Studio, if there is no support for a Cloudera distribution to be used, users can add this distribution by themselves via a wizard to make this distribution compatible with the Studio.

The Cloudera versions users can add vary between V5.11.0 (included) and 6.0.0 (excluded)

The dynamic distribution added this way is not officially supported by Talend.