Writing data to a cloud data warehouse (Snowflake) - Cloud

Talend Cloud Pipeline Designer Getting Started Guide

Version
Cloud
Language
English
Product
Talend Cloud
Module
Talend Pipeline Designer
Content
Deployment > Deploying > Executing Pipelines
Design and Development > Designing Pipelines
Last publication date
2024-03-05

Before you begin

  • You have downloaded the financial_transactions.avro file and you have uploaded it to your Amazon S3 bucket.

  • You have reproduced and duplicated the pipeline described in Writing data to a cloud storage (S3) and you will be working on this duplicated pipeline.
  • You have created a Remote Engine Gen2 and its run profile from Talend Management Console.

    The Cloud Engine for Design and its corresponding run profile come embedded by default in Talend Management Console to help users quickly get started with the app, but it is recommended to install the secure Remote Engine Gen2 for advanced processing of data.

Procedure

  1. On the Home page of Talend Cloud Pipeline Designer, click Connections > Add connection.
  2. In the panel that opens, select Snowflake, then click Next.
  3. Select your Remote Engine Gen2 in the Engine list.
  4. Select Database in the Connection type list and Snowflake in the Database list.
  5. Enter your database JDBC URL and credentials.
  6. Check your connection if needed and click Next.
  7. Give a name to your connection, Snowflake connection for example, then click Validate.
  8. Click Add dataset, and fill in the connection information to your Snowflake table:
    1. Give a display name to your dataset, financial data on Snowflake for example.
    2. In the Type list, select Table or view name.
    3. In the Table name list, select or type the name of your Snowflake table.
    4. In the Column selection field, select the specific table columns you want to retrieve, or click Select all to retrieve all the existing ones. In this example, 2 fields are selected: transaction_amount and transaction_code.
  9. Click View sample to check that your data is valid and can be previewed.
    Preview of the Snowflake data sample.
  10. Click Validate to save your dataset. On the Datasets page, the new dataset is added to the list and can be used as a destination dataset in your pipeline.
    A pipeline with an S3 source, a Python 3 processor, a Filter processor, an Aggregate processor, and a Snowflake destination.

    The Data Mapping icon next to the destination is disabled for now as the input schema is not flat.

  11. Click the Plus icon and add a Field selector processor after the Aggregate processor in order to select the fields you want to keep and flatten the schema. The configuration panel opens.
  12. In the Simple selection mode, click Edit to open the Select fields window:
    1. Select the fields you want to keep and flatten: description and total_amount.
    2. Click Edit to close the window.
    3. Click Save to save the configuration and preview the flattened fields.
    Preview of the Field selector processor after reorganizing the financial records.
  13. Now that the input schema is flat, the Add a Mapper icon is enabled and allows you to add a Data mapping processor to the pipeline. The configuration panel opens.
    Icon to add a Data mapping processor in the canvas.
  14. In the Configuration tab, click Open mapping to open the Data mapping processor.
    Some input fields are automatically mapped with an output field based on their names. You can review these and start mapping the rest of your schema:
    1. Map the total_amount input field with the transaction_amount output field.
    2. Map the description input field with the transaction_code output field.
    3. Click Validate to confirm your mappings.
    Data mapper page with a preview of the mapped records.

    The content of the total_amount input field will be added to the content of the transaction_amount output field according to the operation set for your database (insert, update, upsert, delete).

    The content of the description input field will be added to the content of the transaction_code output field.

    You can check the result of the mapping in the Data preview area.
    The same pipeline as before, with a Data mapper processor added before the Snowflake destination.
  15. Before executing this pipeline, select Upsert in the configuration tab of the Snowflake dataset to update and insert the new data in the Snowflake table. Define the transaction_amount field as the operation key.
    The Snowflake destination configuration panel shows the Upsert action selected.
  16. On the top toolbar of Talend Cloud Pipeline Designer, click the Run button to open the panel allowing you to select your run profile.
  17. Select your run profile in the list (for more information, see Run profiles), then click Run to run your pipeline.

Results

Once your pipeline is executed, the updated data will be visible in the Snowflake database table.