Harvesting metadata - 7.3

Talend Data Catalog User Guide

Version
7.3
Language
English
Product
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Data Catalog
Content
Data Governance
Last publication date
2023-08-09
Metadata harvesting means collecting all metadata from a data source.

You harvest metadata by using Talend Data Catalog bridges.

A bridge is a connector dedicated to a platform. It uses a specific driver to connect to a data source system and collect its metadata.

The following table presents the types of data sources from which you can harvest metadata, depending on your edition.
Talend Data Catalog Standard Advanced Advanced Plus
Harvesting from any supported data store technologies
Harvesting from any supported Data Model tools
Data Integration with DI, ETL and ELT tools
Harvesting from Talend Data Integration, Talend MDM and Talend Data Preparation
Harvesting from any supported Data Integration tools
Data Integration with SQL Scripts and other codes
Harvesting from HiveQL Scripting
Harvesting from any supported SQL Scripting
Business Intelligence (BI Reporting)
Harvesting from Tableau or Qlik
Harvesting from any supported Business Intelligence tools
Harvesting from any supported Metadata Management tools (such as Apache Atlas or Cloudera Navigator)
Business Applications
Harvesting from Salesforce
Harvesting from any supported Business Application tools (such as SAP Business Warehouse 4 HANA)

For more information about the bridges, see Talend Data Catalog Bridges on Talend Help Center.

Before harvesting metadata

Before harvesting metadata, it is important to analyze where the metadata reside, what technology are required to extract them and what process to be followed in order to ensure a proper extraction.

When harvesting metadata in a Talend Data Catalog project, you should follow a specific order:
  • Identify sources data stores, such as operational data stores.
  • Identify data transformation process, such as ETL or ELT.
  • Identify business intelligence systems.
  • Identify existing conceptual models.
  • Configure a bridge and harvest metadata for each system.

You should also organize your metadata repository with labeled folders, for example for each category of metadata.