tSnowflakeBulkExec Standard properties - 7.1

Snowflake

Version
7.1
Language
English (United States)
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Cloud storages > Snowflake components
Data Quality and Preparation > Third-party systems > Cloud storages > Snowflake components
Design and Development > Third-party systems > Cloud storages > Snowflake components

These properties are used to configure tSnowflakeBulkExec running in the Standard Job framework.

The Standard tSnowflakeBulkExec component belongs to the Cloud family.

The component in this framework is available in all subscription-based Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related to database settings vary depending on your database type selection. For more information about dynamic database connectors, see Dynamic database components.

Basic settings

Database

Select a type of database from the list and click Apply.

Property Type

Select the way the connection details will be set.

  • Built-In: The connection details will be set locally for this component. You need to specify the values for all related connection properties manually.

  • Repository: The connection details stored centrally in Repository > Metadata will be reused by this component. You need to click the [...] button next to it and in the pop-up Repository Content dialog box, select the connection details to be reused, and all related connection properties will be automatically filled in.

This property is not available when other connection component is selected from the Connection Component drop-down list.

Connection Component

Select the component that opens the database connection to be reused by this component.

Account

In the Account field, enter, within double quotation marks, the name that has been assigned to you by Snowflake.

Snowflake Region

Select an AWS region or an Azure region from the Snowflake Region drop-down list.

User Id and Password

Enter, within double quotation marks, your authentication information to log in Snowflake.

  • In the User ID field, enter, within double quotation marks, your login name that has been defined in Snowflake using the LOGIN_NAME parameter of Snowflake. For details, ask the administrator of your Snowflake system.

  • To enter the password, click the [...] button next to the password field, and then in the pop-up dialog box enter the password between double quotes and click OK to save the settings.

Warehouse

Enter, within double quotation marks, the name of the Snowflake warehouse to be used. This name is case-sensitive and is normally upper case in Snowflake.

Schema

Enter, within double quotation marks, the name of the database schema to be used. This name is case-sensitive and is normally upper case in Snowflake.

Database

Enter, within double quotation marks, the name of the Snowflake database to be used. This name is case-sensitive and is normally upper case in Snowflake.

Table

Click the [...] button and in the displayed wizard, select the Snowflake table to be used.

Schema and Edit Schema

A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. When you create a Spark Job, avoid the reserved word line when naming the fields.

Built-In: You create and store the schema locally for this component only.

Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs.

If the Snowflake data type to be handled is VARIANT, OBJECT or ARRAY, while defining the schema in the component, select String for the corresponding data in the Type column of the schema editor wizard.

Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this option to view the schema only.

  • Change to built-in property: choose this option to change the schema to Built-in for local changes.

  • Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the Repository Content window.

Note that if the input value of any non-nullable primitive field is null, the row of data including that field will be rejected.

This component offers the advantage of the dynamic schema feature. This allows you to retrieve unknown columns from source files or to copy batches of columns from a source without mapping each column individually. For further information about dynamic schemas, see Talend Studio User Guide.

This dynamic schema feature is designed for the purpose of retrieving unknown columns of a table and is recommended to be used for this purpose only; it is not recommended for the use of creating tables.

Table Action

Select the action to be carried out on the table.

  • NONE: No operation made to the table.
  • DROP_CREATE: Remove the table and create it again.
  • CREATE: Create a new table.
  • CREATE_IF_NOT_EXISTS: Create the specified table if it does not exist.
  • DROP_IF_EXISTS_AND_CREATE: Remove the table if it already exists and create again.
  • CLEAR: Delete the table content.
  • TRUNCATE: Delete all the rows in the table and release the file space back to the operating system.

Output Action

Select the operation to insert, delete, update or merge data in the Snowflake table. By now user should already define a Snowflake table.
  • INSERT: insert new records in a Snowflake table.
  • UPDATE: update existing records in a Snowflake table.
  • UPSERT: create new records and update existing records. In the Upsert Key Column field displayed, specify the key column to be used as the join key for the upsert operation.
  • DELETE: remove records from a Snowflake table.
Storage Select the type of storage where you copy data from:
  • Internal: Select an internal storage to load data. User needs to specify credentials in order to connect to the default Snowflake storage specified in Stage Folder.
  • S3: Select an Amazon S3 bucket to load data.
  • Azure: Select an Azure container to load data.
Stage Folder Specify the path to the default Snowflake stage.

This field is available only when Internal is selected from the Storage drop-down list.

Region Specify the region where the S3 bucket locates.

This field is available only when S3 is selected from the Storage drop-down list.

Access Key and Secret Key Enter the authentication information required to connect to the Amazon S3 bucket to be used.

To enter the password, click the [...] button next to the password field, and then in the pop-up dialog box enter the password between double quotes and click OK to save the settings.

This field is available only when S3 is selected from the Storage drop-down list.

Bucket Enter the name of the bucket to be used to load data. This bucket must

This field appears when you select S3 from the Storage drop-down list.

Folder Enter the directory in which you want to store data in this given bucket. This directory is created if it does not exist at runtime.

This field is available only when S3 is selected from the Storage drop-down list.

Protocol Select the protocol used to create Azure connection.

This field is available only when Azure is selected from the Storage drop-down list.

Account Name Enter the name of the Azure storage account you need to access.

This field is available only when Azure is selected from the Storage drop-down list.

Container Specify the Azure container used for storing and managing data.

This field is available only when Azure is selected from the Storage drop-down list.

Folder Enter the directory in which you want to load data. This directory is created if it does not exist at runtime.

This field is available only when S3 or Azure is selected from the Storage drop-down list.

SAS Token Specify the SAS token to grant limited access to objects in your storage account.

To enter the SAS token, click the [...] button next to the SAS token field, and then in the pop-up dialog box enter the password between double quotes and click OK to save the settings.

This field is available only when Azure is selected from the Storage drop-down list.

Advanced settings

Use Custom Snowflake Region
Select this check box to use the customized Snowflake region.
  • Region ID: enter, within double quotation marks, a region ID, for example eu-west-1 or east-us-2.azure.

For more information on Snowflake Region ID, see https://docs.snowflake.net/manuals/user-guide/intro-regions.html#region-ids-in-account-urls.

Login Timeout

Specify how long to wait for a response when connecting to Snowflake before returning an error.

Tracing

Select the log level for the Snowflake JDBC driver. If enabled, a standard Java log is generated.

Role

Enter, within double quotation marks, the default access control role to use to initiate the Snowflake session.

This role must already exist and has been granted to the user ID you are using to connect to Snowflake. If this field is left empty, the PUBLIC role is automatically granted. For further information about the Snowflake access control model, see Snowflake documentation at Understanding the Access Control Model.

Allow Snowflake to convert columns and tables to uppercase

Select this check box to convert lowercase in the defined table name and schema column names to uppercase. Note that unquoted identifiers should match the Snowflake Identifier Syntax.

If you deselect the check box, all identifiers are automatically quoted.

This property is not available when you select the Manual Query check box.

For more information on the Snowflake Identifier Syntax, refer to Identifier Syntax.

Custom DB Type Select this check box to specify the data type for each column in the schema.

This property is available only when you select an action with Create Table from the Table Action drop down list in the Basic settings.

Delete Storage Files On Success Delete all files in your storage folder once the Job is run successfully.

This field is not available when you select Use Custom Storage Location.

S3 Max Error Retry

Specify the maximum retry times when an error occurs during loading data to the S3 bucket.

This field appears when you select S3 from the Storage drop-down list.

Azure Max Error Retry

Specify the maximum retry times when an error occurs during loading data to the Azure container.

This field appears when you select Azure from the Storage drop-down list.

Use Custom Stage Path

Select this check box to use the specified stage to upload data.

This field is available only when Internal is selected from the Storage drop-down list in the Basic Settings. Once selected, the Stage Folder in Basic settings is disabled.

Use Custom Storage Location Select this check box to connect to any custom external storage, for example, S3.

Copy Command Options

Select a Copy command mode from the drop-down list to copy data to one of the following Snowflake stage.
  • Default: By default, only read-only string properties with options are used.
  • Table: A table with two columns, Option and Value, are used to configure the command options.
  • Manual: You can set any custom options using a string property.

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the Job level as well as at each component level.

Global Variables

NB_LINE

The number of rows processed. This is an After variable and it returns an integer.

NB_SUCCESS

The number of rows successfully processed. This is an After variable and it returns an integer.

NB_REJECT

The number of rows rejected. This is an After variable and it returns an integer.

ERROR_MESSAGE

The error message generated by the component when an error occurs. This is an After variable and it returns a string.

Usage

Usage rule

This component is an end component of a data flow in your Job. It receives data from other components through the Row > Main link.