Scenario: Importing data into MongoDB database - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

The following scenario describes a Job that firstly imports data from a CSV file into the specified MongoDB collection, then reads data from the MongoDB collection to check if the import is successful, next continues to import data from a JSON file with the same data structure into the same MongoDB collection, and finally displays the data from the MongoDB collection to demonstrate that the data from the JSON file is also imported successfully.

Dropping and linking the components

  1. Drop the following components from the Palette onto the design workspace: two tMongoDBBulkLoad components, two tMongoDBInput components, and two tLogRow components.

  2. Connect the first tMongoDBBulkLoad to the first tMongoDBInput using a Trigger > OnSubjobOk link.

  3. Connect the first tMongoDBInput to the first tLogRow using a Row > Main link.

  4. Repeat the two steps above to connect the second tMongoDBBulkLoad to the second tMongoDBInput, and the second tMongoDBInput to the second tLogRow.

  5. Connect the first tMongoDBInput to the second tMongoDBBulkLoad using a Trigger > OnSubjobOk link.

  6. Label the two tLogRow components to better identify the data displayed on the console.

Configuring the components

Importing data from a CSV file

  1. Double-click the first tMongoDBBulkLoad component to open its Basic settings view in the Component tab.

  2. In the MongoDB directory field, type in the MongoDB home directory. In this example, it is D:/MongoDB.

  3. In the Server and Port fields, fill in the information required for the connection to MongoDB. In this example, type in localhost and 27017.

  4. In the Database field, type in the database to import data to, bookstore in this example.

    In the Collection field, type in the collection to import data to, books in this example.

  5. Select the Drop collection if exist check box to remove the specified collection if it already exists.

  6. Browse to the desired data file from which you want to import data. In this example, it is D:/Input/books.csv, which is a standard CSV file containing four columns: id, title, author, and category.

    id,title,author,category
    1,Computer Networks,Larry Peterson,Computer Science
    2,David Copperfield,Charles Dickens,Language&Literature
    3,Life of Pi,Yann Martel,Language&Literature
    
  7. Select CSV from the File type list.

  8. Select Insert from the Action on data list.

  9. Select the First line is header check box to use the first line in the CSV file as a header.

    Select the Ignore blanks check box to ignore the blank fields (if any) in the CSV file.

Validating that the CSV file is imported successfully

  1. Double-click the first tMongoDBInput component to open its Basic settings view in the Component tab.

  2. From the DB Version list, select the MongoDB version you are using.

  3. In the Server and Port fields, fill in the information required for the connection to MongoDB. In this example, type in localhost and 27017.

  4. In the Database field, type in the database from which the data will be read, bookstore in this example.

  5. In the Collection field, type in the collection from which the data will be read, books in this example.

  6. Click Edit schema to define the data structure to be read from the MongoDB collection.

  7. In the Mapping table, the Column field is automatically populated with the defined schema. You do not need to fill in the Parent node path column.

  8. Double-click the first tLogRow component to open its Basic settings view in the Component tab.

  9. In the Mode area, select Table (print values in cells of a table).

Importing data from a JSON file

  1. Double-click the second tMongoDBBulkLoad component to open its Basic settings view in the Component tab.

  2. In the MongoDB directory field, type in the MongoDB home directory. In this example, it is D:/MongoDB.

  3. In the Server and Port fields, fill in the information required for the connection to MongoDB. In this example, type in localhost and 27017.

  4. In the Database field, type in the target database to import data, bookstore in this example.

    In the Collection field, type in the target collection to import data, books in this example.

  5. Browse to the desired data file from which you want to import data. Here, select books.json.

    {
            "id": "4",
            "title": "Les Miserables",
            "author": "Victor Hugo",
            "category": "Language&Literature"
    }
    {
            "id": "5",
            "title": "Advanced Database Systems",
            "author": "Carlo Zaniolo",
            "category": "Database"
    
    }
  6. Select JSON from the File type list.

  7. Select Insert from the Action on data list.

  8. Click the Advanced settings tab to define the additional arguments as needed.

    In this example, add the argument " --jsonArray" to accept the imported data within a single JSON array.

Validating that the JSON file is imported successfully

  1. Repeat Step 1 through Step 7 described in the procedure Validating that the CSV file is imported successfully to configure the second tMongoDBInput component.

  2. Repeat Step 8 through Step 9 described in the procedure Validating that the CSV file is imported successfully to configure the second tLogRow component.

Saving and executing the Job

  1. Press Ctrl + S to save the Job.

  2. Execute the Job by pressing F6 or clicking Run on the Run tab.

    The data from the collection books in the MongoDB database bookstore is displayed on the console, which contains the data imported from both the CSV file books.csv and the JSON file books.json.