Scenario 1: Creating a collection and writing data to it - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

This scenario creates the collection blog and writes post data to it.

Linking the components

  1. Drop tMongoDBConnection, tFixedFlowInput, tMongoDBOutput, tMongoDBClose, tMongoDBInput and tLogRow onto the workspace.

  2. Rename tFixedFlowInput as blog_post_data, tMongoDBOutput as write_data_to_collection, tMongoDBInput as read_data_from_collection and tLogRow as show_data_from_collection.

  3. Link tMongoDBConnection to tFixedFlowInput using the OnSubjobOk trigger.

  4. Link tFixedFlowInput to tMongoDBOutput using a Row > Main connection.

  5. Link tFixedFlowInput to tMongoDBInput using the OnSubjobOk trigger.

  6. Link tMongoDBInput to tMongoDBClose using the OnSubjobOk trigger.

  7. Link tMongoDBInput to tLogRow using a Row > Main connection.

Configuring the components

  1. Double-click tMongoDBConnection to open its Basic settings view.

  2. From the DB Version list, select the MongoDB version you are using.

  3. In the Server and Port fields, enter the connection details.

    In the Database field, enter the name of the MongoDB database.

  4. Double-click tFixedFlowInput to open its Basic settings view.

    Select Use Inline Content (delimited file) in the Mode area.

    In the Content field, enter the data to write to the MongoDB database, for example:

    1;Andy;Open Source Outlook;Open Source,Talend;Talend, the leader of the open source world...
    3;Andy;ELT Overview;ELT,Talend;Talend, the big name in the ELT circle...
    2;Andy;Data Integration Overview;Data Integration,Talend;Talend, the leading player in the DI field...
  5. Double-click tMongoDBOutput to open its Basic settings view.

    Select the Use existing connection and Drop collection if exist check boxes.

    In the Collection field, enter the name of the collection, namely blog.

  6. Click the [...] button next to Edit schema to open the schema editor.

  7. Click the [+] button to add five columns in the right part, namely id, author, title, keywords and contents, with the type as Integer and String respectively.

    Click to copy all the columns to the input table.

    Click Ok to close the editor.

  8. The columns now appear in the left part of the Mapping area.

    For columns author, title, keywords and contents, enter their parent node post. By doing so, those nodes reside under the node post in the MongoDB collection.

  9. Double-click tMongoDBInput to open its Basic settings view.

    Select the Use existing connection check box.

    In the Collection field, enter the name of the collection, namely blog.

  10. Click the [...] button next to Edit schema to open the schema editor.

  11. Click the [+] button to add five columns, namely id, author, title, keywords and contents, with the type as Integer and String respectively.

    Click OK to close the editor.

  12. The columns now appear in the left part of the Mapping area.

    For columns author, title, keywords and contents, enter their parent node post so that the data can be retrieved from the correct positions.

  13. In the Sort by area, click the [+] button to add one line and enter id under Column.

    Select asc from the Order asc or desc? column to the right of the id column. This way, the retrieved records will appear in ascending order of the id column.

Executing the Job

  1. Press Ctrl+S to save the Job.

  2. Press F6 to run the Job.

  3. Switch to the database talend and read data from the collection blog in the MongoDB command line client. You can find that author, title, keywords and contents all reside under the node post. Meanwhile, the records are stored in the same order as the source input.