What's new in R2021-01 - 7.3

Talend Big Data products Release Notes

Version
7.3
Language
English (United States)
EnrichDitaval
Big Data
Product
Talend Big Data
Talend Big Data Platform
Talend Open Studio for Big Data
Talend Real-Time Big Data Platform
Content
Installation and Upgrade
Release Notes

Big Data: new features

Feature

Description

Product

Assume Role configuration for Databricks 5.5 LTS and 6.4 distributions

When you are running a Job on Databricks 5.5 LTS or 6.4 and you want to write and read data from S3, you can now make your Job temporarily assume a role and the permissions associated with this role.

This allows you not to specify the secret and access keys to Databricks clusters in the tS3Configuration component. You now only have to specify the Amazon Resource Name (ARN) of the role to assume in the Spark configuration view and enter the bucket name then select the Inherit credentials from AWS check box in the Basic settings view of the tS3Configuration component.

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

Basic Assume Role configuration in tS3Configuration component When you enable the Assume Role option in the tS3Configuration component, you can now configure the following properties from the Basic settings view to fine tune your configuration:
  • Serial Number
  • Token Code
  • Tags
  • Transitive Tag Keys
  • Policy ARNs
  • Policy

This feature is now available for the CDP Private Cloud Base 7.1 distribution.

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

Topic, partition, and key options available in Kafka components You can now add information about the key and the partition used for the messages in the tKafkaOutput component. The tKafkaInput component will read these information in its output schema thanks to the following new attributes: topic, partition, and key.

This feature allows you to retrieve and show more information in the Kafka message from the topic.

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

tKafkaCommit available in Spark Streaming Jobs You can now use the tKafkaCommit component in your Spark Streaming Jobs with Spark v2.0 and onwards in the Local Spark mode. This component allows you to manually control when the offset is commited. It enables to have a commit in one go rather than having an auto-commit at a given time interval.

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

Deprecated distributions The following distributions are now deprecated:
  • HDP 2.6.0 and backwards
  • Cloudera CDH 5.16 and backwards
  • MapR 5.2.0 and backwards
  • Microsoft HD Insight 3.4 and backwards
  • Databricks 3.5 LTS and backwards
  • Cloudera Altus 1.0
  • Dataproc 1.1

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

Data Integration: new features

Feature

Description

Product

Shared mode for Talend Studio Talend Studio now supports the shared mode, which allows each user on the machine where Talend Studio is installed to work with different configuration and workspace folders.

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

Libraries sharing enhancement

Talend Studio now supports:

  • configuring whether to share libraries to the local libraries repository at startup
  • sharing libraries manually after startup

By default, the libraries are not shared at Talend Studio startup to improve the startup performance.

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

SAP function extraction path customizable

You can specify the path for the SAP function to generate the files that hold the data extracted. Components applied:

  • tELTSAPMap
  • tSAPDSOInput (with Use FTP-Batch Options selected in the Basic settings view)
  • tSAPODPInput (with Use FTP-Batch Options selected in the Basic settings view)
  • tSAPInfoCubeInput (with Use FTP-Batch Options selected in the Basic settings view)

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

tGPGDecrypt: specifying additional parameters for the GPG decrypt command

The Use extra parameters option is provided, allowing you to specify additional parameters for the GPG decrypt command.

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

Support for Greenplum 6.x

This release provides support for Greenplum 6.x.

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

Greenplum components: the default Database driver changed

For Greenplum components, the database driver defaults to Greenplum.

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

tGreenplumGPLoad improved

Multiple new features/options are added to tGreenplumGPLoad. As listed below.

  • The Populate column list based on the schema option in the Basic settings view, which adds the columns defined in the schema to the YAML file.
  • New parameters provided in the Addition options table: LOG_ERRORS, MAX_LINE_LENGTH, EXTERNAL_SCHEMA (_ext_stg_objects), PRELOAD_TRUNCATE, PRELOAD_REUSE_TABLES, PRELOAD_STAGING_TABLE, PRELOAD_FAST_MATCH, SQL_BEFORE LOAD, and SQL_AFTER LOAD.
  • The Remove datafile on successful execution option and the Gzip compress the datafile option in the Advanced settings view, which removes the datafile when the load operation completes successfully and compresses the datafile using Gzip.
  • New global variables provided: NB_LINE_INSERTED, NB_LINE_UPDATED, NB_DATA_ERRORS, GPLOAD_STATUS, and GPLOAD_RUNTIME.

Talend Big Data

Talend Big Data Platform

Talend Real-Time Big Data Platform

Data Quality: new features

Feature

Description

Product

Shared mode Talend Studio now supports the shared mode. If you enable it, some paths change:
  • For tBRMS, the path to the Drools folder is C:/Users/user-account/studio-path/Drools/
  • For tDqReportRun, the path to the Generated reports folder is C:/Users/user-account/studio-path/Generated reports/
  • For the synonym indexes, the path to the addons folder is C:/Users/user-account/studio-path/addons/

Talend Big Data Platform

Talend Real-Time Big Data Platform

Supported databases SAP Hana is now supported in the Profiling perspective for Table, View and Calculation view schemas.

Talend Big Data Platform

Talend Real-Time Big Data Platform

New components

The tSAPHanaValidRows and tSAPHanaInvalidRows components check SAP Hana database rows against specific data quality patterns (regular expression) or data quality rules (business rule).

Talend Big Data Platform

Talend Real-Time Big Data Platform

tDataMasking

tDataUnmasking

The Dynamic data type is now supported by the Standard component.

Talend Big Data Platform

Talend Real-Time Big Data Platform