Talend Cloud Data Inventory concepts - Cloud

Talend Cloud Data Inventory User Guide

EnrichVersion
Cloud
EnrichProdName
Talend Cloud
EnrichPlatform
Talend Data Inventory
task
Administration and Monitoring > Managing connections
Data Governance
Data Quality and Preparation > Enriching data
Data Quality and Preparation > Identifying data
Data Quality and Preparation > Managing datasets

These definitions will help you understand the main concepts of Talend Cloud Data Inventory.

  • Connection: Connections are environments or systems where datasets are stored, including databases, file systems, distributed systems or platforms, etc. The connection information to these systems only need to be set up once since they are reusable.
  • Dataset: Datasets are collections of data. They can be database tables, file names, topics (Kafka), file paths (HDFS), etc. You also have the possibility to create test datasets that you enter manually and store in a test connection, and even import local files as datasets. Several datasets can be connected to the same system (One-to-many Connectivity) and are stored in reusable connections.
  • Sample: Your data will be visible in the form of a sample, retrieved from the dataset metadata.
  • Semantic type: The semantic type of a column or record corresponds to the type of data that can be found in it, such as names, zip codes, phone numbers, coordinates, etc. The Talend Cloud applications all benefit from semantic awareness, meaning that when you look at your sample data, it will be automatically categorized using the default semantic types, or the ones that you have created yourself.
  • Trust Score: Global quality indicator that aggregates several metrics into a single score, that scales from 0 to 5.
  • Custom attributes: Custom attributes can be applied to your datasets. They allow you to add metadata information following a set of predefined rules and can also be used to help you search and sort your datasets.
  • Tags: This second tagging method allows you to freely add any text as metadata information to your Talend Cloud objects, like applying a post-it.
  • Cloud Engine: The Cloud Engine is a built-in runner that allows users to easily process data without having to set up any processing engines. With this engine you can run two objects in parallel. For advanced processing of data it is recommended to install the secure Remote Engine.
  • Remote Engine: A Remote Engine is a secure execution engine on which you can safely run objects. It allows you to have control over your execution environment and resources as you are able to create and configure the engine in your own environment (Virtual Private Cloud or on premises).

    A Remote Engine ensures:

    • Data processing in a safe and secure environment as Talend never has access to your data and resources.
    • Optimal performance and security by increasing the data locality instead of moving large data to computation.
Relationship between connections, datasets, and other entities

From the connection metadata, your data is retrieved and can be visualized as a sample. From there you can use other tools like Talend Cloud Data Preparation or Talend Cloud Pipeline Designer to further transform your data.