Cleaning up the repository - 8.0

Talend Data Catalog Installation and Upgrade Guide

Version
8.0
Language
English
Operating system
Windows
Product
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Data Catalog
Content
Installation and Upgrade
Last publication date
2024-01-26

You need to clean up the repository to prepare the Talend Data Catalog server upgrade.

The repository can contain obsolete or unused content. As this content is live and indexed, it has an impact on the database performance and space.

Ensure that the database has at least 20% free space. The upgrade process may take several hours on large repositories and also need extra space for temp data during the migration.

Work with your repository database administrator to ensure that the database is cleaned.

Here are the actions you can perform to free up some space.

Deleting unused test or sandbox type content

  1. Browse through the repository to identify the unused content.
  2. Delete it from the repository manager.

Deleting unused versions of configurations

You keep a copy of configurations created for backup or historical analysis purposes or a new version is created each time the entire metadata is harvested in your configuration management process.

These copies can impact the database space and performance. They also consume resources such as the disk space, index size or performance of search.

You should delete the old and unused versions of configurations:
  1. Go to Manage > System.
  2. Run the Get repository configuration statistics operation from the Operations drop-down list.

    If there is a large ratio between the number of configuration versions and the total number of configurations, perform the following steps.

  3. Browse through the repository to identify the older versions.
  4. Delete them from the repository manager.

Deleting unused versions of models

For the same reasons as the deletion of unused configuration versions, you should delete the old and unused versions of models:
  1. Browse through the repository to identify the older versions.
  2. Go to Manage > Schedules.
  3. Configure and run the Delete unused versions operation.

    This operation deletes a version of a model if this version is not used in a version of the configuration and if this version has been imported more than an hour and before a specified number of days.

Verifying that the incremental harvesting option is enabled in the model setup

The incremental harvesting option saves the processing time during the import and consumes less space. Only the part of the model that has changed is re-imported and written as a new version to the repository database. The rest of the content is reused in the new version. It applies to large databases, file systems and Business Intelligence servers.

This option can be disabled manually by adding the -cache.clear option in the Miscellaneous parameter.

You should verify for each large model if the option has not been disabled manually (when it is available):
  1. Open the import setup of each model.
  2. If you see the -cache.clear option in the Miscellaneous parameter, remove it.
  3. Save your changes.

Deleting the operation logs

Operation logs are not indexed and should not affect the performance but they can take significant space. It applies to large databases, file systems and Business Intelligence servers.

You should delete the operation logs:
  1. Go to Manage > Schedules.
  2. Configure and run the Delete operation logs operation.

    This operation deletes completed operations and their logs older than a specified number of days. You can delete logs of failed operations or logs of successful and failed operations.

Disabling the Debug logging option in Manage System

Debug logs are not indexed and should not affect the performance but they can take significant space. It applies to large databases, file systems and Business Intelligence servers.

When you use this option for testing or reporting a ticket, you should disable it once finished.
  1. Go to Manage > System.
  2. In the Debug logging field, select Disable from the drop-down list.

Running the database maintenance operation

You need to run the database maintenance operation to complete the actions previously performed to clean up the repository.
  1. Go to Manage > Schedules.
  2. Configure and run the Run database maintenance operation.

    This operation allows to maintain database indexes and statistics.

    If a large number of contents and versions are deleted at once, you should execute the operation several times.

You are now ready to update Talend Data Catalog with the latest patches.