Metadata Harvesting from third-party databases, Data Modeling, Data Integration or Business Intelligence tools is performed by the integrated Talend Data Catalog Metadata Harvesting solution.
By default, the installer software deploys and configures both Talend Data Catalog and Talend Data Catalog Metadata Harvesting on the same machine, where Talend Data Catalog Application Server accesses Metadata Harvesting Web Services locally.
Metadata Harvesting can also be installed and configured as a remote Metadata Harvesting agent on another machine. This is very useful in architecture deployments where the metadata management server is deployed:
- remotely on the cloud and needs to access metadata harvesting servers (agents) locally on premise, or
- on Linux and needs to access metadata harvesting servers (agents) on a Windows machine where data modeling, data integration or business intelligence client tools are Windows only (such as COM based SDK).
|M_BROWSE_PATH||to browse local and mapped network drive.
All metadata harvesting file and directory parameter references are relative to the server. The server must have access to these resources anytime another event (such as scheduled harvest) is to occur. When harvesting a model, the user interface presents a set of paths that can be browsed to select these files and directories.
Setting the M_BROWSE_PATH parameter allows you to define which drives and network paths are available in the user interface. You can update the M_BROWSE_PATH using the user interface (on the application server) presented by the setup.sh, or by editing the <TDC_HOME>/TalendDataCatalog/conf/conf.properties file directly.
On installation, the set includes all directly attached drives, which is specified by an asterisk "*" (M_BROWSE_PATH=*).
For Windows based application servers, when running as a service, the drive names (mapped) and paths may not be the same as what a user sees when logged in, and thus the "*" value will not see all drives you might expect when selecting drives using the UI. Instead, you must explicitly list all the drives and network paths that you want to be available to all users in the UI.
Also, it is not sufficient to simply enter the mapped drive ID (for example N:\), as that drive mapping is also generally not available to services. Thus, you should specify the physical drives by letters, but must specify the network paths completely, for example:
M_BROWSE_PATH=C:\, E:\, \\network-drive\shared\
Note: It also applies even to script backup and restore drives.
|M_DATA_DIRECTORY||to relocate the data such as the log files, and metadata incremental harvesting cache as needed for very large Data Integration or Business Intelligence tools.|
|M_JAVA_OPTIONS||to increase the maximum memory used by Java bridges during the metadata
harvesting of very large databases, Data Modeling, Data Integration or Business
This parameter defines the default maximum for all Java bridges. However, most memory intensive Java bridges (such as JDBC bridges) have the ability to define its own maximum memory in their last parameter called Miscellaneous.
|M_SERVER_URL||is the URL of the Talend Data Catalog Application Server on the cloud, such as http://<server>:11480/MM.|
|M_AGENT_NAME||is the agent name, such as MyCompanyOnPremise, that the above Talend Data Catalog Application Server will then use to refer to this metadata harvesting server agent.|