Harvesting metadata - 8.0

Talend Data Catalog User Guide

Version
8.0
Language
English
Product
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Data Catalog
Content
Data Governance
Last publication date
2023-09-26
Metadata harvesting means collecting all metadata from a data source.

You harvest metadata by using Talend Data Catalog bridges.

A bridge is a connector dedicated to a platform. It uses a specific driver to connect to a data source system and collect its metadata.
Note: The pre-installed database drivers in the <TDC_HOME>\TalendDataCatalog\tomcat\shared folder are for connecting to Talend Data Catalog repository database, not for harvesting. To harvest metadata, you need to install the driver to connect to a data source system and update the driver location parameters. For more information, see Importing metadata.
The following table presents the types of data sources from which you can harvest metadata, depending on your edition.
Talend Data Catalog Standard Advanced Advanced Plus
Harvesting from any supported data store technologies
Harvesting from any supported Data Model tools
Data Integration with DI, ETL and ELT tools
Harvesting from Talend Data Integration, Talend MDM and Talend Data Preparation
Harvesting from any supported Data Integration tools
Data Integration with SQL Scripts and other codes
Harvesting from HiveQL Scripting
Harvesting from any supported SQL Scripting
Business Intelligence (BI Reporting)
Harvesting from Tableau or Qlik
Harvesting from any supported Business Intelligence tools
Harvesting from any supported Metadata Management tools (such as Apache Atlas or Cloudera Navigator)
Business Applications
Harvesting from Salesforce
Harvesting from any supported Business Application tools (such as SAP Business Warehouse 4 HANA)

For more information about the bridges, see Talend Data Catalog Bridges on Talend Help Center.

Before harvesting metadata

Before harvesting metadata, it is important to analyze where the metadata reside, what technology are required to extract them and what process to be followed in order to ensure a proper extraction.

Ensure that you have the proper connectivity to the external format metadata source.

Ensure that you have full access to any auxiliary resources. It depends on the external format you are attempting to connect to.

When harvesting metadata in a Talend Data Catalog project, you should follow a specific order:
  • Identify sources data stores, such as operational data stores.
  • Identify data transformation process, such as ETL or ELT.
  • Identify business intelligence systems.
  • Identify existing conceptual models.
  • Configure a bridge and harvest metadata for each system.

You should also organize your metadata repository with labeled folders, for example for each category of metadata.

Browsing the file system

Many import actions require pointing to files on the application server.

When configuring Talend Data Catalog, you have to specify the precise locations on the file system to include in the browse list.

You can specify the locations using Setup.bat or the command line.

The drives available for browsing are controlled by the conf.properties file.

Imported models and custom models

There are two types of models on the repository:
  • Imported models are the models associated with an import bridge to be populated through the model harvesting process. These models are referred to as technical models. They are also considered business models when imported from business applications or business intelligence (BI) tools.
  • Custom Models are instantiations of a custom model type in the metamodel. They are referred to as business models. They are also considered technical models depending on the domains.