Harvesting metadata - Cloud

Talend Cloud Data Catalog User Guide

EnrichVersion
Cloud
EnrichProdName
Talend Cloud
EnrichPlatform
Talend Data Catalog
task
Data Governance
Metadata harvesting means collecting all metadata from a data source.

You harvest metadata by using Talend Cloud Data Catalog bridges.

A bridge is a connector dedicated to a platform. It uses a specific driver to connect to a data source system and collect its metadata.

The following table presents the types of data sources from which you can harvest metadata, depending on your edition.
Talend Cloud Data Catalog Standard Advanced Advanced Plus
Harvesting from any supported data store technologies
Harvesting from any supported Data Model tools
Data Integration with DI, ETL and ELT tools
Harvesting from Talend Data Integration, Talend MDM and Talend Data Preparation
Harvesting from any supported Data Integration tools
Data Integration with SQL Scripts and other codes
Harvesting from HiveQL Scripting
Harvesting from any supported SQL Scripting
Business Intelligence (BI Reporting)
Harvesting from Tableau or Qlik
Harvesting from any supported Business Intelligence tools
Harvesting from any supported Metadata Management tools (such as Apache Atlas or Cloudera Navigator)
Business Applications
Harvesting from Salesforce
Harvesting from any supported Business Application tools (such as SAP Business Warehouse 4 HANA)

You need to install remote harvesting servers on premises when you need to harvest metadata not accessible from Talend Cloud Data Catalog or to use bridges not available in the embedded harvesting agent.

These bridges are identified with a note indicating that they are not available in Talend Cloud Data Catalog by default in Talend Cloud Data Catalog Bridges on Talend Help Center.

Before harvesting metadata

Before harvesting metadata, it is important to analyze where the metadata reside, what technology are required to extract them and what process to be followed in order to ensure a proper extraction.

When harvesting metadata in a Talend Cloud Data Catalog project, you should follow a specific order:
  • Identify sources data stores, such as operational data stores.
  • Identify data transformation process, such as ETL or ELT.
  • Identify business intelligence systems.
  • Identify existing conceptual models.
  • Configure a bridge and harvest metadata for each system.

You should also organize your metadata repository with labeled folders, for example for each category of metadata.