Configuring Talend Data Catalog Metadata Harvesting - 7.1

Talend Data Catalog Installation and Upgrade Guide for Windows

Version
7.1
Language
English (United States)
Product
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Data Catalog
Content
Installation and Upgrade

Metadata Harvesting from third-party databases, Data Modeling, Data Integration or Business Intelligence tools is performed by the integrated Talend Data Catalog Metadata Harvesting solution.

By default, the installer software deploys and configures both Talend Data Catalog and Talend Data Catalog Metadata Harvesting on the same machine, where Talend Data Catalog Application Server accesses Metadata Harvesting Web Services locally.

Metadata Harvesting can also be installed and configured as a remote Metadata Harvesting agent on another machine. This is very useful in architecture deployments where the metadata management server is deployed:

  • remotely on the cloud and needs to access metadata harvesting servers (agents) locally on premise, or
  • on Linux and needs to access metadata harvesting servers (agents) on a Windows machine where data modeling, data integration or business intelligence client tools are Windows only (such as COM based SDK).
Essential customizations (such as directories and memory) of Metadata Harvesting Application Server can be performed in the <TDC_HOME>\TalendDataCatalog\conf\conf.properties configuration file.
Parameter name Description
M_BROWSE_PATH to browse local and mapped network drive.

All metadata harvesting file and directory parameter references are relative to the server. The server must have access to these resources anytime another event (such as scheduled harvest) is to occur. When harvesting a model, the user interface presents a set of paths that can be browsed to select these files and directories.

Setting the M_BROWSE_PATH parameter allows you to define which drives and network paths are available in the user interface. You can update the M_BROWSE_PATH using the user interface (on the application server) presented by the setup.bat, or by editing the <TDC_HOME>\TalendDataCatalog\conf\conf.properties file directly.

On installation, the set includes all directly attached drives, which is specified by an asterisk "*" (M_BROWSE_PATH=*).

For Windows based application servers, when running as a service, the drive names (mapped) and paths may not be the same as what a user sees when logged in, and thus the "*" value will not see all drives you might expect when selecting drives using the UI. Instead, you must explicitly list all the drives and network paths that you want to be available to all users in the UI.

Also, it is not sufficient to simply enter the mapped drive ID (for example N:\), as that drive mapping is also generally not available to services. Thus, you should specify the physical drives by letters, but must specify the network paths completely, for example:

M_BROWSE_PATH=C:\, E:\, \\network-drive\shared\

Note: It also applies even to script backup and restore drives.
M_DATA_DIRECTORY to relocate the data such as the log files, and metadata incremental harvesting cache as needed for very large Data Integration or Business Intelligence tools.
M_JAVA_OPTIONS to increase the maximum memory used by Java bridges during the metadata harvesting of very large databases, Data Modeling, Data Integration or Business Intelligence tools.

This parameter defines the default maximum for all Java bridges. However, most memory intensive Java bridges (such as JDBC bridges) have the ability to define its own maximum memory in their last parameter called Miscellaneous.

When Metadata Harvesting Application Server is used a local metadata harvesting agent connected to a Talend Data Catalog Application Server on the cloud, the additional customizations are needed in the <TDC_HOME>\TalendDataCatalog\conf\agent.properties configuration file.
Parameter name Description
M_SERVER_URL is the URL of the Talend Data Catalog Application Server on the cloud, such as http://<server>:11480/MM.
M_AGENT_NAME is the agent name, such as MyCompanyOnPremise, that the above Talend Data Catalog Application Server will then use to refer to this metadata harvesting server agent.