Scenario 2: Upserting records in a collection - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

This scenario upserts the collection blog as an existing record has its author changed and a new record is added. Before the upsert, the collection blog looks like:

1;Andy;Open Source Outlook;Open Source,Talend;Talend, the leader of the open source world...
2;Andy;Data Integration Overview;Data Integration,Talend;Talend, the leading player in the DI field...
3;Andy;ELT Overview;ELT,Talend;Talend, the big name in the ELT circle...

Such records can be inserted to the database following the instructions of Scenario 1: Creating a collection and writing data to it.

Linking the components

  1. Drop tMongoDBConnection, tFixedFlowInput, tMongoDBOutput, tMongoDBClose, tMongoDBInput and tLogRow from the Palette onto the design workspace.

  2. Rename tFixedFlowInput as blog_post_data, tMongoDBOutput as write_data_to_collection, tMongoDBInput as read_data_from_collection and tLogRow as show_data_from_collection.

  3. Link tMongoDBConnection to tFixedFlowInput using the OnSubjobOk trigger.

  4. Link tFixedFlowInput to tMongoDBOutput using a Row > Main connection.

  5. Link tFixedFlowInput to tMongoDBInput using the OnSubjobOk trigger.

  6. Link tMongoDBInput to tMongoDBClose using the OnSubjobOk trigger.

  7. Link tMongoDBInput to tLogRow using a Row > Main connection.

Configuring the components

  1. Double-click tMongoDBConnection to open its Basic settings view.

  2. From the DB Version list, select the MongoDB version you are using.

  3. In the Server and Port fields, enter the connection details.

    In the Database field, enter the name of the MongoDB database.

  4. Double-click tFixedFlowInput to open its Basic settings view.

    Select Use Inline Content (delimited file) in the Mode area.

    In the Content field, enter the data for upserting the MongoDB database, for example:

    1;Andy;Open Source Outlook;Open Source,Talend;Talend, the leader of the open source world...
    2;Andy;Data Integration Overview;Data Integration,Talend;Talend, the leading player in the DI field...
    3;Anderson;ELT Overview;ELT,Talend;Talend, the big name in the ELT circle...
    4;Andy;Big Data Bang;Big Data,Talend;Talend, the driving force for Big Data applications... 

    As shown above, the 3rd record has its author changed and the 4th record is new.

  5. Double-click tMongoDBOutput to open its Basic settings view.

    Select the Use existing connection and Die on error check boxes.

    In the Collection field, enter the name of the collection, namely blog.

    Select Upsert from the Action on data list.

  6. Click the [...] button next to Edit schema to open the schema editor.

  7. Click the [+] button to add five columns in the right part, namely id, author, title, keywords and contents, with the type as Integer and String respectively.

    Click to copy all the columns to the input table.

    Click Ok to close the editor.

  8. In the Advanced Settings view, select the Generate JSON Document check box.

    Select the Remove root node check box.

    In the Data node and Query node fields, enter "data" and "query".

  9. Click the [...] button next to Configure JSON Tree to open the configuration interface.

  10. Right-click the node rootTag and select Add Sub-element from the contextual menu.

    In the dialog box that appears, type in data for the Data node:

    Click OK to close the window.

    Repeat this operation to define query as the Query node.

    Right-click the node data and select Set As Loop Element from the contextual menu.

    Warning

    These nodes are mandatory for update and upsert actions. They are intended to enable the update and upsert actions though will not be stored in the database.

  11. Select all the columns under the Schema list and drop them to the data node.

    In the window that appears, select Create as sub-element of target node.

    Click OK to close the window.

    Repeat this operation to drop the id column from the Schema list under the Query node.

  12. Right-click the node id under data and select Add Attribute from the contextual menu.

    In the dialog box that appears, type in type as the attribute name:

    Click OK to close the window.

    Right-click the node @type under id and select Set A Fix Value from the contextual menu.

    In the dialog box that appears, type in integer as the attribute value, ensuring the id values are stored as integers in the database.

    Click OK to close the window.

    Repeat this operation to set this attribute for the id node under Query.

    Click OK to close the JSON Tree configuration interface.

  13. Double-click tMongoDBInput to open its Basic settings view.

    Select the Use existing connection check box.

    In the Collection field, enter the name of the collection, namely blog.

    Click the [...] button next to Edit schema to open the schema editor.

    Click the [+] button to add five columns, namely id, author, title, keywords and contents, with the type as Integer and String respectively.

    Click OK to close the editor.

    The columns now appear in the left part of the Mapping area.

    For columns author, title, keywords and contents, enter their parent node post so that the data can be retrieved from the correct positions.

  14. Double-click tLogRow to open its Basic settings view.

    In the Mode area, select Table (print values in cells of a table for better display.

Executing the Job

  1. Press Ctrl+S to save the Job.

  2. Press F6 to run the Job.

    As shown above, the 3rd record has its author updated and the 4th record is inserted.