Big Data - Cloud - 8.0

Talend Release Notes

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Cloud API Services Platform
Talend Cloud Big Data
Talend Cloud Big Data Platform
Talend Cloud Data Fabric
Talend Cloud Data Integration
Talend Cloud Data Management Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Cloud API Designer
Talend Cloud API Tester
Talend Cloud Data Inventory
Talend Cloud Data Preparation
Talend Cloud Data Stewardship
Talend Cloud Pipeline Designer
Talend Data Preparation
Talend Data Stewardship
Talend Management Console
Talend Studio
Content
Installation and Upgrade
Release Notes
Last publication date
2024-04-16

Feature

Description

Available in

New component tManagePartitions to manage Spark datasets partitions in Spark Batch Jobs A new component, tManagePartitions, is available in your Spark Batch Jobs, replacing tPartition which is now deprecated. This component allows you to manage your partitions by visually defining how an input dataset is partitioned.

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All subscription-based Talend products with Big Data

Support for auto partitioning with tManagePartitions in Spark Batch Jobs A new option Auto is available in the Partitioning strategy drop-down list from the Basic settings view of tManagePartitions in your Spark Jobs. This option allows you to calculate the best strategy to apply on a dataset.

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All subscription-based Talend products with Big Data

New component tCacheClear to clear Spark cache in Spark Batch Jobs A new component, tCacheClear, is available in your Spark Batch Jobs. This component allows you to remove the RDD (Resilient Distributed Datasets) cache stored by tCacheOut from memory.

Clearing the cache is a good practice, for example, if the caching layer becomes full, Spark will start evicting the data from memory using the LRU (least recently used) strategy. For this reason, unpersisting allows you to stay more in control about what should be evicted. Also, the more space you have in memory, the more it can be used by Spark for execution, for building hash maps for example

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All subscription-based Talend products with Big Data

Support for Kudu format with tImpalaCreateTable in Standard Jobs The Kudu format is supported when creating a table with tImpalaCreateTable in your Standard Jobs. When you work with a Kudu table, you can also configure the number of partitions to be created with the new Kudu partition parameter.

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All subscription-based Talend products with Big Data

New component tHBaseDeleteRow to delete rows from an HBase table in Standard Jobs A new component, tHBaseDeleteRow, is available in your Standard Jobs. This component allows you to delete rows with data from an HBase table by providing row keys.

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All subscription-based Talend products with Big Data

Ability to run Spark Batch Jobs with HBase components using Knox with CDP Public Cloud You can use Knox with HBase in your Spark Batch Jobs running on CDP Public Cloud. You can configure Knox either in the tHBaseConfiguration parameters or in the HBase metadata wizard.

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All subscription-based Talend products with Big Data

Support for parallel reading from HBase table in Spark Batch Jobs A new option Partition by table regions is available in the Basic settings view of tHBaseInput in your Spark Batch Jobs. This option allows you to read in parallel the data from an HBase table using its number of regions.

Available in:

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

All subscription-based Talend products with Big Data