Managing MDM databases - 6.4

Talend MDM Platform Installation Guide for Windows

EnrichVersion
6.4
EnrichProdName
Talend MDM Platform
task
Installation and Upgrade
EnrichPlatform
Talend Activity Monitoring Console
Talend Administration Center
Talend Artifact Repository
Talend CommandLine
Talend Data Preparation
Talend Data Stewardship
Talend DQ Portal
Talend ESB
Talend Identity Management
Talend Installer
Talend JobServer
Talend Log Server
Talend MDM Server
Talend MDM Web UI
Talend Project Audit
Talend Repository Manager
Talend Runtime
Talend SAP RFC Server
Talend Studio

The list of available databases depends on which version of Talend MDM you are using.

For more information on compatible databases, see Compatible Databases.

Note that, if you are working with an Oracle database, you must make sure that the Oracle user which Talend MDM uses to connect to the database has not been assigned the DBA role, or else you may encounter unexpected issues.

Managing SQL databases

You can store the MDM repository and your master data records in a SQL database. For licensing reasons, if you want to use any of the supported SQL databases other than the H2 database embedded with the product, you need to download and manually install the database drivers in <TomcatPath>\webapps\talendmdm\WEB-INF\lib before using Talend products.

For more information on compatible databases, see Compatible Databases.

Connecting to the H2 embedded database

MDM Server includes a web console that you can use to access the H2 embedded database directly in order to perform administration tasks. For more information about how to use the H2 database and web console, refer to the H2 database documentation at http://www.h2database.com.

To connect to the H2 web console, do the following.

  1. In a web browser, enter the address of the H2 console (for example, http://localhost:8180/talendmdm/h2console/console).

  2. In the login screen that opens, enter the Username and Password for an MDM user with administrator rights (for example, administrator/administrator, or admin/talend), and then click Login.

    The H2 console connection screen opens.

  3. Enter the connection information relative to your database, and then click Connect.

    The H2 console opens with access to the MDM database.

Working with a Staging Area

Talend MDM implementations where master data records are stored in a SQL database include a Staging Area in which MDM users can load data that is then asynchronously validated.

This Staging Area is a mirror of the SQL storage area containing the master data records. The only difference is that the staging area database does not include constraints in its schema (such as not null values or foreign key integrity). However, this does not mean that records with invalid foreign keys can be loaded into the Master database if Foreign Key Integrity is enabled.

For each data container (a data container being an instance of a SQL storage), a second SQL storage also exists which is the Staging Area for the data container in question. For example, if you have a data container called Product, this means that a data container called Product#STAGING also exists.

MDM users can load data into the Staging Area in the following ways:

  • using SQL components: execute INSERT statements on the database, using JDBC or Data Integration components

  • using MDM components: use the tMDMOuput and tMDMBulkload components and specify that the data container is "data_container_name#STAGING" (for example, "Product#STAGING" instead of "Product")

    Note that tMDMOutput does not allow the insertion or update of non-valid data.

Talend MDM provides both a user interface and REST access to trigger the transfer from the Staging Area to the master database. This transfer is called a staging area validation task because it includes a step where records from the Staging Area are validated against MDM validation rules (such as XSD, Security and Validation rules).

For more details on how to trigger this transfer from the Talend MDM Web User Interface, see the Talend MDM Web User Interface Guide.

Key differences between Staging Area and master database

There are relatively few differences between the database schema of the master database and the Staging Area.

Difference

Description

No Foreign Key relations

In the Staging Area, all Foreign Key relations are disabled. This is to allow users to load data without taking into account the relationships between entities.

Additional column for source

In the Staging Area, the database schema has an additional text column where MDM users can provide details about the origin of the record. This column is free-form and is not mandatory.

For example, when loading data in the Staging Area, a user may indicate that records with the ID 1 come from SAP and the records with the ID 2 come from another legacy system.

Additional column for status

In the Staging Area, this column is used to store actions carried out by MDM on the staging record. The column contains a code:

  • "000" or null means "new record".

  • "2nn" values (200, 201...) mean the record successfully passed a step of record validation.

  • "4nn" values (400, 401...) mean the record failed to pass a step of record validation.

Configuring MDM Server

The mdm.conf file stored in the <$INSTALLDIR>\conf directory contains all the configuration information for MDM Server. You can edit some of the information in this file to change the behavior of MDM Server in certain cases.

Editing the mdm.conf file

You can edit the file mdm.conf to customize the configuration for the MDM server according to your needs.

The parameters and their default values available in the file are listed below:

Settings

Parameters and Default Values

Usage

System settings

admin.user=admin

admin.password=talend

Specifies the credentials for the default administrator users of the MDM server.

The password is encrypted upon the startup of the MDM server. For more information about how to change the default password, see Managing the passwords in configuration files.

system.data.auto.init=true

Indicates whether the MDM server will automatically initialize the system data during startup.

system.data.auto.upgrade=true

Indicates whether the MDM server will automatically upgrade the system data.

system.data.force.upgrade=false

Indicates whether the migration tasks will be executed even when the new system Version Number is equal to or less than the current system Version Number.

system.locale.default=en

Specifies the default system locale.

max.export.browserecord=1000

Indicates the maximum number of records to be exported on the Talend MDM Web User Interface.

max_cache_size=5000

Indicates the maximum size of the cache.

cluster_override=false

Indicates whether to override the existing content in the database for the system container when starting the MDM server.

RDBMS Settings

db.autoPrepare=true

Indicates to the MDM server whether it should try to create a database on its own or not.

db.datasources=${mdm.root}/conf/datasources.xml

Points to a file that provides the database connection information.

db.default.datasource=H2-Default

Indicates to the MDM server which default datasource specified in db.datasources should be used.

Subscription Engine

subscription.engine.autostart=true

Indicates whether JMS (Java Message Service) in the system will be started automatically.

system.cluster.storage.index.replication=true

Indicates whether to apply index replication per storage.

mdm.routing.engine.broker.url = vm://localhost

Specifies the connection URL for the JMS service.

routing.engine.max.execution.time.millis=300000

Indicates the maximum milliseconds a routing order can take to execute. The value of "0" indicates the routing order never expires.

For example, the value of "300000" is equal to 5 minutes.

mdm.routing.engine.consumers=5-10

Indicates the number of simultaneous routing order consumers that can be concurrently processed.

Its value supports a number range. For example, "5-10" means that at least 5 consumers will be present, with a maximum of 10 if all previous consumers are busy.

mdm.routing.engine.broker.sessionCacheSize=1

Indicates the number of cached JMS sessions per session acknowledgement type (auto, client, dups_ok, transacted).

If needed, you can increase this number in a high-concurrency environment.

Bonita user synchronization

scheduler.enabled=true

Indicates whether to enable the periodic batch job for synchronization with the Bonita server.

scheduler.retryCount=10

Specifies the number of retries when the synchronization job fails.

scheduler.retryInterval.seconds=60

Specifies the interval of time in seconds between each retry attempt.

scheduler.period.seconds=3600

Specifies the period of the synchronization job.

scheduler.initialDelay.seconds=10

Specifies the delay time allowed for the initial synchronization job after the system startup.

scheduler.shutdownTimeout.seconds=30

Indicates the time allowed for the system to gracefully shut down the synchronization job when the job is completed.

realtime.synchronize.enabled=true

Indicates whether to enable realtime synchronization.

Bonita technical user/password

technical.user=install

technical.password=install

Specifies credentials for Bonita technical users.

The password is encrypted upon the startup of the MDM server. For more information about how to change the default password, see Managing the passwords in configuration files.

Note

Bonita technical users are used by MDM to connect to the Bonita BPM server and then perform operations such as deploying workflows and synchronizing workflow users. For more information about how to configure the connection between the MDM server and the Bonita BPM server, see Installing the Bonita BPM server manually.

Improving performance when loading data from the Staging Area

The mdm.conf file, which is stored in the <$INSTALLDIR>\conf directory, includes several properties which can be edited to improve performance.

Property

Type

Description

staging.validation.updatereport

boolean

When you run the validation process that loads data from the Staging Area to Talend MDM, by default this process generates update reports that record inserts in the Journal and fire any relevant Triggers.

To instruct MDM Server not to generate update reports when loading data from the Staging Area, set this property to false.

staging.validation.pool

int

By default, MDM Server assigns two threads to the validation process that loads data from the Staging Area to Talend MDM.

If the machine running MDM Server has some spare CPU, edit the value of this property to increase the number of threads that can be used.

staging.validation.commit

int

By default, when loading data from the Staging Area to Talend MDM, Talend MDM commits records to the master database every 1000 validated objects.

To improve performance, increase this value to reduce the number of commits made to the database.

staging.validation.buffer.threshold

int

A buffer is used to transfer records from the Staging Area to the master database. By default, the buffer holds a maximum of 1000 records; once this threshold is reached, the action of reading from the Staging Area will be paused and a check will be made every second to see if the number of records in the buffer has decreased.

Since reading from the Staging Area is always faster than writing to the master database, reduce the value of the buffer size if you encounter memory issues.

Managing the passwords in configuration files

For the sake of security, the passwords are encrypted in the configuration files datasources.xml, tdsc-database.properties, and mdm.conf under the directory <MDM_ROOT>/conf/, so that the actual password string is not shown in plain text.

Here MDM_ROOT refers to where the MDM HOME file is placed after you install the MDM modules. For more information about how to install MDM modules, see Installing the MDM server and Talend Data Stewardship Console manually.

  • datasources.xml: This file includes the passwords for Talend MDM to access different kinds of databases. For an example about how to use the datasources.xml file, see Managing how MDM Server uses SQL databases.

  • mdm.conf: This file contains all the configuration information for MDM Server, which includes two passwords: admin.password and technical.password. For more information, see Configuring MDM Server.

  • tdsc-database.properties: This file includes the password to access the database used by Talend Data Stewardship Console. For more information, see Talend Data Stewardship Console User Guide.

    The Talend Data Stewardship Console is deprecated since Talend 6.4. Consider migrating to Talend Data Stewardship.

If needed, you can use a plain text password to change the encrypted default password.

Alternatively, you can encrypt a password using the CommandLine first and then use it in the configuration files directly. For more information, see Encrypting the passwords using the CommandLine.

The following example shows how to use a plain text password to change the default password of the default administrator user of the MDM server in the file mdm.conf.

  1. Open the file <MDM_ROOT>/conf/mdm.conf.

  2. Locate the encrypted default admin password.

    admin.password=aYfBEdcXYP3t9pofaispXA==,Encrypt
  3. Remove the existing password, including ",Encrypt", and then enter a new plain text password for the default administrator user.

  4. Save your changes and close the file.

    Upon the next startup of the MDM server, the new password will be encrypted and the file will be updated with this encrypted new password.

Indexing databases

This section describes some changes you can make to how your databases are indexed.

Regenerating full-text indexes

When you store your data records in a SQL database, you may at times need to regenerate full-text indexes, for instance to deal with index desynchronization issues.

To regenerate a full-text index when using a SQL database, do the following:

  • Enter the following HTTP call in your web browser:

    http://server:port/talendmdm/services/run?action=reindex&container=<name>

    where server:port is the server and port on which Talend MDM is installed (such as localhost:8180), name refers to the name of the data container for which you want to regenerate the index (such as Product)).

Managing how MDM Server uses SQL databases

The datasources.xml file stored in the <$INSTALLDIR>\conf directory contains information about how Talend MDM uses SQL databases, including where they are located and how to tune their usage. The initial version of this file is created as part of the installation process, but administrators can edit the file as needed.

Managing database search modes

In Talend MDM, the master data records can be stored in a SQL database, and a second SQL storage also exists which serves as the Staging Area for the master database. For more information, see Working with a Staging Area.

Searches in the master database support both full text search and standard SQL search, while searches in the staging database only support standard SQL search.

Standard SQL search considers the search content as an integrate phrase to be matched, while full text search interprets the search content as a phrase which contains multiple words, and the records that match any of the words will be returned. For example, if the search content is "test first name", full text search considers it as three independent words "test", "first", and "name", while standard SQL search considers it as one phrase "test first name".

By default, the full text search mode is used for searches in the master database.

If needed, you can change the default settings in the file <$INSTALLDIR>\conf\datasources.xml to optimize the master database searches.

The following example shows how to use standard SQL search for searches in the master database:

  1. Open the file <$INSTALLDIR>\conf\datasources.xml.

  2. Add the option contains-optimization with the value like following the option fulltext-index-directory. For example:

    <fulltext-index-directory>E:\Talend-MDMServer\data
    \indexes\H2-Default</fulltext-index-directory>
    <contains-optimization>like</contains-optimization>
  3. Save your changes.

The value for the option contains-optimization can be:

Value

Description

like

This is the default value when no full-text indexes are set. In this case, the standard SQL search mode is used for searches in the master database.

disabled

With this value set, an exception will be thrown if MDM encounters a condition with a CONTAINS.

fulltext

This is the default value when full-text indexes are available. In this case, the full text search mode is used for searches in the master database.

If the option contains-optimization is set to fulltext, the MDM server will perform case insensitive searches no matter what the value of case-sensitive-search is. For more information, see Managing the case-sensitivity in database searches.

Managing the case-sensitivity in database searches

Depending on the configuration of your SQL database, searches in the database may be case sensitive.

To make searches case insensitive, set the value of the <case-sensitive-search> option in the <$INSTALLDIR>\conf\datasources.xml file to false.

For example:

<datasources xmlns="http://www.talend.com/mdm/datasources">
    <datasource name="MySQL-DS1">
        <master>
            <type>RDBMS</type>
            <rdbms-configuration>
                ...
                <case-sensitive-search>false</case-sensitive-search>
                ...
            </rdbms-configuration>
        </master>
    </datasource>
</datasources>

If this option is not available, all searches performed against the database are case sensitive.