Before you begin
-
You have downloaded the financial_transactions.avro file
and you have uploaded it to your Amazon S3 bucket.
- You have reproduced and duplicated the pipeline described in Writing data to a cloud storage (S3) and you will
be working on this duplicated pipeline.
- You have created a Remote Engine Gen2 and its run profile from Talend Management Console.
The Cloud Engine for Design
and its corresponding run profile come embedded by default in Talend Management Console
to help users quickly get started with the app, but it is recommended to install
the secure Remote Engine Gen2
for advanced processing of data.
Procedure
-
On the Home page of Talend Cloud Pipeline Designer,
click .
-
In the panel that opens, select Snowflake, then click
Next.
-
Select your Remote Engine Gen2 in
the Engine list.
-
Select Database in the Connection
type list and Snowflake in the
Database list.
-
Enter your database JDBC URL and credentials.
-
Check your connection if needed and click Next.
-
Give a name to your connection, Snowflake connection for
example, then click Validate.
-
Click Add dataset, and fill in the connection information to
your Snowflake table:
-
Give a display name to your dataset, financial data on
Snowflake for example.
-
In the Type list, select Table or view
name.
-
In the Table name list, select or type the name of
your Snowflake table.
-
In the Column selection field, select the specific
table columns you want to retrieve, or click Select all
to retrieve all the existing ones. In this example, 2 fields are selected:
transaction_amount and
transaction_code.
-
Click View sample to check that your data is valid and can
be previewed.
-
Click Validate to save your dataset. On the
Datasets page, the new dataset is added to the list and can
be used as a destination dataset in your pipeline.
The Data Mapping icon next to the destination is disabled for now as the input
schema is not flat.
-
Click the icon and add a Field selector processor after the
Aggregate processor in order to select the fields you want
to keep and flatten the schema. The configuration panel opens.
-
In the Simple selection mode, click to open the Select fields window:
-
Select the fields you want to keep and flatten:
description and
total_amount.
-
Click Edit to close the window.
-
Click Save to save the configuration and preview the
flattened fields.
-
Now that the input schema is flat, the icon is enabled and allows you to add a Data mapping
processor to the pipeline. The configuration panel opens.
-
In the Configuration tab, click Open
mapping to open the Data mapping processor.
Some input fields are automatically mapped with an output field based on their
names. You can review these and start mapping the rest of your schema:
-
Map the total_amount input field with the
transaction_amount output field.
-
Map the description input field with the
transaction_code output field.
-
Click Validate to confirm your mappings.
The content of the total_amount input field will be added
to the content of the transaction_amount output field
according to the operation set for your database (insert, update, upsert,
delete).
The content of the description input field will be added to
the content of the transaction_code output field.
You can check the result of the mapping in the
Data preview
area.
-
Before executing this pipeline, select Upsert in the
configuration tab of the Snowflake dataset to update and insert the new data in the
Snowflake table. Define the transaction_amount field as the
operation key.
-
On the top toolbar of Talend Cloud Pipeline Designer,
click the Run button to open the panel allowing you to select
your run profile.
-
Select your run profile in the list (for more information, see Run profiles), then click Run to
run your pipeline.
Results
Once your pipeline is executed, the updated data will be visible in the Snowflake
database table.