Talend Cloud Data Preparation architecture (Beta) - Cloud

Talend Cloud Data Preparation User Guide

Version
Cloud
Language
English
Product
Talend Cloud
Module
Talend Data Preparation
Content
Administration and Monitoring > Managing connections
Data Quality and Preparation > Cleansing data
Data Quality and Preparation > Managing datasets
Last publication date
2024-02-21

This architecture diagram identifies the functional blocks of Talend Cloud Data Preparation
Functional blocks of Talend Cloud Data Preparation illustrated.

The diagram is divided into two main parts: the local network and the cloud infrastructure.

Local network

The local network includes a web browser, Talend Studio, a Remote Engine Gen2, and a Runtime Server.

  • From your web browser, you can access Talend Cloud Data Preparation, Talend Dictionary Service, and Talend Management Console.
  • From Talend Studio, you can benefit from the Talend Cloud Data Preparation features through the use of the tDatasetInput, tDatasetOutput, and tDataprepRun components. You can create datasets from various databases and export them in Talend Cloud Data Preparation, or leverage a preparation directly in a data integration Job or Spark Job.
  • The Remote Engine Gen 1 is used to run the Jobs that use the Data Preparation components, and run artifacts and tasks on premises.
  • The Remote Engine Gen2 is used to run objects from the Talend Cloud applications, such as preparations, as well as creating connections and fetching data samples.

Cloud infrastructure

The cloud infrastructure includes Talend Cloud Data Preparation that relies on the Dataset service, and the Cloud Engine for Design.

  • The Dataset service is what provides the unified dataset list for Talend Cloud Data Preparation, Talend Cloud Data Inventory and Talend Cloud Pipeline Designer.
  • In Talend Management Console, you can administrate roles, users, projects, and licenses. You can create new users for the cloud applications and assign them to custom groups. You can then define roles and assign them to your users. Talend Management Console is also used to import your license files and create projects to collaborate on in Talend Studio. In addition, you can enable data and file transfer, data integration, and access to shared data sources for web users. You can, for example, import and use preconfigured sample Tasks, or design Tasks that automate the exchange and synchronization of data between applications.
  • In Talend Cloud Data Preparation, you can import your data from local files or other sources, and cleanse or enrich it by creating new preparations.
  • In Talend Dictionary Service, you can add, remove, or modify the semantic categories that are applied to each column in your data when opened in Talend Cloud Data Preparation.
  • The Cloud Engine for Design is used to run artifacts, tasks and preparations in the cloud, as well as creating connections and fetching data samples.