Talend Cloud Data Preparation atop Talend Cloud Data Inventory - Cloud

Talend Cloud Data Preparation User Guide

Version
Cloud
Language
English
Product
Talend Cloud
Module
Talend Data Preparation
Content
Administration and Monitoring > Managing connections
Data Quality and Preparation > Cleansing data
Data Quality and Preparation > Managing datasets
Last publication date
2024-03-22

The common inventory of datasets for Talend Cloud Data Inventory, Talend Cloud Pipeline Designer and Talend Cloud Data Preparation, brings a unified experience across the Talend Cloud applications.

Even if you are not subscribed to standalone Talend Cloud Data Inventory application, you will be able to benefit from several new features and improvements in your Talend Cloud Data Preparation experience, compared to the hybrid and on-premises versions, because of the common architecture.

These additions will have an impact on your Talend Cloud Data Preparation usage since they bring new concepts as well. Here are the most notable changes:

  • New concept of reusable connections

    To create a remote dataset, stored in Salesforce or Amazon S3 for example, you would usually use the Add dataset button, select the platform, and enter your connection information each time. Now you can set up this connection information only once, save it as a reusable Connection, and reuse it to create new datasets any time. These connections to your datastores are listed in the new Connections tab.

    Connections tab opened.
  • Extended native connectivity

    A whole new range of connection types is now available natively in the application. Create preparations on datasets from databases, file systems, distributed systems, platforms and more. For the full list of sources that you can connect to, see the List of supported connectors.

    However, keep in mind that Talend Cloud Data Preparation does not support hierarchical formats and does not support streaming.

  • Direct upload for local files

    In the Datasets page, a new Drop a file or browse button is available, allowing you to quickly and easily import your local files. You can either drag and drop your file on the datasets page, or browse using the explorer. A form then opens where you can set some configuration for the dataset, or just Auto-detect the parameters.

    Drop your file anywhere button illustrated.
  • New indicators in the dataset list

    When opening your list of datasets, you will notice new columns, containing new indicators.

    New indicators in the dataset list showed.
    • First of all, a quality bar detailing the repartition of empty, valid, and invalid records across the dataset. Hover over each color to access the exact percentage and records number.
    • In addition, a new feature in the application allows you to apply a rating score on the dataset based on its quality and other personal criteria. The rating score that you can see in the dataset list is an average of the scores applied by all the users who have access to the dataset.
    • Finally, the trust score, represented by the shields icon, give you at a glance an overall score of the quality and completeness of your dataset. It aggregates several indicators such as the quality itself, or the presence of a rating score or certification.
  • More flexible sharing

    The new sharing dialog allows you to assign a role to other users when sharing connections, datasets, or preparation folders with other users. The Viewer, Editor, or Owner roles all come with different levels of permissions on the actions that can be performed on shared objects. To assign a specific role to a collaborator, open the sharing dialog, select the user or group you want to share your object with, and click Add as....

    The role you have assigned someone can be updated anytime, and you can even remove yourself from the list of contributors on a specific shared object.

    Add as... drop-down list opened.
  • Creating a preparation

    Using the Add dataset button, you can create a dataset when you are creating a preparation.

    One dataset selected.

    However another way to easily create preparation has been introduced. Directly from your list of datasets, hover over a dataset and select the Talend Cloud Data Preparation icon. Click Add to start cleansing your data right away.

    Talend Cloud Data Preparation icon selected.
  • Dataset provenance and destination

    In addition to its role as preparation creation shortcut, the Talend Cloud Data Preparation button that appears when pointing your mouse over a dataset has another useful purpose. When you click this icon for a given dataset, you will be able to see all the preparations that have been created from it, along with their creator, giving you more insight on how your data is used.

    Talend Cloud Data Preparation icon selected.
  • Removal of live datasets

    Given the extended native connectivity and features brought by this release, creating and using live datasets will not be possible anymore. All existing live datasets are now unusable.

  • Make line as header

    This function will not be available from the functions panel of your preparations anymore. Instead, you can select which row to use as header for your dataset in the dataset properties at import time.

  • Excel files with multiple worksheets

    When uploading an excel file that contains multiple sheets, only the first one will be imported by default, but you can choose which sheet to import in the dataset creation form. However, the Auto-detect feature is not supported for such files.