Centralizing MongoDB metadata - 6.3

Talend Real-time Big Data Platform Studio User Guide

EnrichVersion
6.3
EnrichProdName
Talend Real-Time Big Data Platform
task
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

If you often need to handle data of a MongoDB database, then you may want to centralize the connection to the database and the schema details in the Metadata folder in the Repository tree view.

The MongoDB metadata setup procedure is made of two separate but closely related major tasks:

  1. Create a connection to a MongoDB database.

  2. Retrieve MongoDB schemas of interest.

Prerequisites:

  • All the required external modules that are missing in Talend Studio due to license restrictions have been installed. For more information, see the Talend Installation Guide.

Creating a connection to a MongoDB database

  1. In the Repository tree view, expand the Metadata node, right-click NoSQL Connection, and select Create Connection from the contextual menu. The connection wizard opens up.

  2. In the connection wizard, fill in the general properties of the connection you need to create, such as Name, Purpose and Description.

    The information you fill in the Description field will appear as a tooltip when you move your mouse pointer over the connection.

    When done, click Next to proceed to the next step.

  3. Select MongoDB from the DB Type list and MongoDB version of the database you are connecting to from the DB Version list, and specify the following details:

    • Enter the host name or IP address and the port number of the MongoDB server in the corresponding fields.

      If the database you are connecting to is replicated on different hosts of a replica set, select the Use replica set address check box, and specify the host names or IP addresses and the respective ports in the Replica set address table. This can improve data handling reliability and performance.

    • If you want to restrict your MongoDB connection to a particular database only, enter the database name in the Database field.

      If you leave this field blank, the wizard will list the collections of all the existing databases on the connected server when you retrieve schemas.

    • If your MongoDB server requires authentication for database access, select the Require authentication check box and provide your username and password in the corresponding fields.

  4. Click the Check button to make sure that the connection works.

  5. Click Finish to validate the settings.

    The newly created MongoDB database connection appears under the NoSQL Connection node in the Repository tree view. You can now drop it onto your design workspace as a MongoDB component, but you still need to define the schema information where needed.

    Next, you need to retrieve one or more schemas of interest for your connection.

Retrieving schemas

In this step, we will retrieve the schemas of interest from the connected MongoDB database.

  1. In the Repository view, right-click the newly created connection and select Retrieve Schema from the contextual menu.

    The wizard opens a new view that lists all the available collections of the specified databases, or all the available database if you did not specify one in the previous step.

  2. Expand the database, or databases of interest if you did not specify a database in the previous step as in this example, and select the collection or collections of interest.

  3. Click Next to proceed to the next step of the wizard where you can edit the generated schema or schemas.

    By default, each generated schema is named after the collection on which it is based.

    Select a schema from the Schema panel to display its details on the right side, and modify the schema if needed. You can rename any schema, and customize the schema structure according to your needs in the Schema area.

    The tool bar allows you to add, remove or move columns in your schema, or replace the schema with the schema defined in an XML file.

    To base a schema on another collection, select the schema name in the Schema panel, and select a new collection from the Based on Collection list, and click the Guess Schema button to overwrite the schema with that of the selected collection. You may need to click the refresh button to refresh the list of collections.

    To add a new schema, click the Add Schema button in the Schema panel, which creates an empty schema for you to define.

    To remove a schema, select the schema name in the Schema panel and click the Remove Schema button.

    To overwrite the modifications you made on the selected schema using its default schema, click Guess schema. Note that all your changes to the schema will be lost if you click this button.

  4. Click Finish to complete the schema creation. The result schemas appear under your MongoDB connection in the Repository view. You can now drop the connection or any schema node under it onto your design workspace as a MongoDB component, with all the metadata information automatically filled.

    If you need to further edit a schema, right-click the schema and select Edit Schema from the contextual menu to open this wizard again and make your modifications.

    Warning

    If you modify the schemas, ensure that the data type in the Type column is correctly defined.