Writing data to a cloud data warehouse (Snowflake) - Cloud

Talend Cloud Pipeline Designer Getting Started Guide

Version
Cloud
Language
English
Product
Talend Cloud
Module
Talend Pipeline Designer
Content
Deployment > Deploying > Executing Pipelines
Design and Development > Designing Pipelines

Before you begin

  • You have retrieved the financial_transactions.avro file from the Downloads tab in the left panel of this page and you have uploaded it to your Amazon S3 bucket.

  • You have reproduced and duplicated the pipeline described in Writing data to a cloud storage (S3) and you will be working on this duplicated pipeline.

Procedure

  1. On the Home page of Talend Cloud Pipeline Designer, click Connections > Add connection.
  2. In the panel that opens, give a name to your connection, Snowflake connection for example.
  3. Select your Remote Engine Gen2 in the Engine list.
  4. Select Database in the Connection type list and Snowflake in the Database list.
  5. Enter your database JDBC URL and credentials.
  6. Check your connection if needed and click Add dataset to point to the Snowflake table that will serve as your destination dataset.
  7. In the Add a new dataset panel, fill in the connection information to your Snowflake table:
    1. Give a display name to your dataset, financial data on Snowflake for example.
    2. In the Type list, select Table or view name.
    3. In the Table name list, select or type the name of your Snowflake table.
    4. In the Column selection field, select the specific table columns you want to retrieve, or click Select all to retrieve all the existing ones. In this example, 2 fields are selected: transaction_amount and transaction_code.
  8. Click View sample to check that your data is valid and can be previewed.
  9. Click Validate to save your dataset. On the Datasets page, the new dataset is added to the list and can be used as a destination dataset in your pipeline.

    The Data Mapping icon next to the destination is disabled for now as the input schema is not flat.

  10. Click the + icon and add a Field selector processor after the Aggregate processor in order to select the fields you want to keep and flatten the schema. The configuration panel opens.
  11. In the Simple selection mode, click to open the Select fields window:
    1. Select the fields you want to keep and flatten: description and total_amount.
    2. Click Edit to close the window.
    3. Click Save to save the configuration and preview the flattened fields.
  12. Now that the input schema is flat, the is enabled and allows you to add a Data mapping processor to the pipeline. The configuration panel opens.
  13. In the Configuration tab, click Open mapping to open the Data mapping processor.
    Some input fields are automatically mapped with an output field based on their names. You can review these and start mapping the rest of your schema:
    1. Map the total_amount input field with the transaction_amount output field.
    2. Map the description input field with the transaction_code output field.
    3. Click Validate to confirm your mappings.

    The content of the total_amount input field will be added to the content of the transaction_amount output field according to the operation set for your database (insert, update, upsert, delete).

    The content of the description input field will be added to the content of the transaction_code output field.

    You can check the result of the mapping in the Data preview area.
  14. Before executing this pipeline, select Upsert in the configuration tab of the Snowflake dataset to update and insert the new data in the Snowflake table. Define the transaction_amount field as the operation key.
  15. On the top toolbar of Talend Cloud Pipeline Designer, select your run profile in the list (for more information, see Run profiles).
  16. Click the run icon to run your pipeline.

Results

Once your pipeline is executed, the updated data will be visible in the Snowflake database table.