Inserting filtered data into an Azure Cosmos DB table - Cloud

Talend Cloud Apps Connectors Guide

Version
Cloud
Language
English
Product
Talend Cloud
Module
Talend Data Inventory
Talend Data Preparation
Talend Pipeline Designer
Content
Administration and Monitoring > Managing connections
Design and Development > Designing Pipelines
Last publication date
2024-03-21

This scenario aims at helping you set up and use connectors in a pipeline. You are advised to adapt it to your environment and use case.

Before you begin

Procedure

  1. Click Connections > Add connection.
  2. In the panel that opens, select the type of connection you want to create.

    Example

    Cosmos DB
  3. Select your engine in the Engine list.
    Note:
    • It is recommended to use the Remote Engine Gen2 rather than the Cloud Engine for Design for advanced processing of data.
    • If no Remote Engine Gen2 has been created from Talend Management Console or if it exists but appears as unavailable which means it is not up and running, you will not be able to select a Connection type in the list nor to save the new connection.
    • The list of available connection types depends on the engine you have selected.
  4. Select the type of connection you want to create.
    Here, select CosmosDB.
  5. Fill in the connection properties to access your Azure Cosmos DB database as described in Azure Cosmos DB properties, check the connection and click Add dataset.
  6. In the Add a new dataset panel, name your dataset. In this example, the Cosmos DB collection will be used to hold processed data about leads.

    Example

    leads
  7. Fill in the required properties corresponding to the Cosmos DB collection located in your Azure account.
  8. Click Validate to save your dataset.
  9. Do the same to add the Test connection and dataset that will be used as source in your pipeline to populate the CRM.
    In this example, a dataset named bank marketing data with the following CSV schema is used:
    CSV Schema:
    age;job;marital;education;default;balance;housing;loan;contact;day;month;duration;campaign;pdays;previous;poutcome;y
  10. Click Add pipeline on the Pipelines page. Your new pipeline opens.
  11. Give the pipeline a meaningful name.

    Example

    Inserting bank marketing data into a CosmosDB table
  12. Click ADD SOURCE and select your source dataset, bank marketing data in the panel that opens.
  13. Click to add processors to the pipeline, for example a Type converter processor to convert data about balance from String type to Double type.
  14. Click to add a Filter processor to filter on leads that are 30 years old or older, that are managers and that have a balance superior to 2000 dollars.
  15. Click the ADD DESTINATION item on the pipeline to open the panel allowing to select the Cosmos DB dataset in which your output data will be inserted.
  16. In the Configuration tab of the destination, click Main and:
    1. Enable both the Create collection if not exists and Auto ID generation options in order to create a database collection and ID when executing the pipeline, in which the data will be inserted.
    2. Click Main and select Insert in the Data action list to insert the data into the existing dataset when executing the pipeline.
  17. Click Save to save your configuration.
  18. On the top toolbar of Talend Cloud Pipeline Designer, click the Run button to open the panel allowing you to select your run profile.
  19. Select your run profile in the list (for more information, see Run profiles), then click Run to run your pipeline.

Results

Your pipeline is being executed, the data has been processed and filtered and the output flow is inserted into the Azure Cosmos DB table you have defined.

You can check the log of your pipeline to see details about the volume of data sent to Azure Cosmos DB.