tSnowflakeBulkExec Standard properties - 7.3

Snowflake

Version
7.3
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Cloud storages > Snowflake components
Data Quality and Preparation > Third-party systems > Cloud storages > Snowflake components
Design and Development > Third-party systems > Cloud storages > Snowflake components
Last publication date
2024-02-21

These properties are used to configure tSnowflakeBulkExec running in the Standard Job framework.

The Standard tSnowflakeBulkExec component belongs to the Cloud family.

The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related to database settings vary depending on your database type selection. For more information about dynamic database connectors, see Dynamic database components.

Basic settings

Database

Select a type of database from the list and click Apply.

Property Type

Select the way the connection details will be set.

  • Built-In: The connection details will be set locally for this component. You need to specify the values for all related connection properties manually.

  • Repository: The connection details stored centrally in Repository > Metadata will be reused by this component. You need to click the [...] button next to it and in the pop-up Repository Content dialog box, select the connection details to be reused, and all related connection properties will be automatically filled in.

This property is available when Use this Component is selected from the Connection Component drop-down list.

Connection Component

Select the component that opens the database connection to be reused by this component.

Account

In the Account field, enter, in double quotation marks, the account name that has been assigned to you by Snowflake.

This field is available only when Use this Component is selected from the Connection Component drop-down list.

Authentication Type

Set the authentication type.
  • Basic: Select this option if key pair authentication is not enabled.
  • Key Pair: Select this option if key pair authentication is enabled. For information about key pair authentication, see Using Key Pair Authentication.
  • OAuth 2.0: Select this option to use external OAuth for data accessing. See External OAuth Overview for related information.
Note: Before selecting the Key Pair option, make sure you have set the key pair authentication data in the Basic settings view of the tSetKeystore component as follows.
  • Leave the TrustStore type field unchanged;
  • Set TrustStore file to "";
  • Clear the TrustStore password field;
  • Select Need Client authentication;
  • Enter the path to the key store file in double quotation marks in the KeyStore file field (or click the […] button to the right of the KeyStore file field and navigate to the key store file);
  • Enter the key store file password in the KeyStore password field;
  • Clear the Check server identity option.
Note: The OAuth 2.0 option is available only if you have installed the R2020-06 Studio Monthly update or a later one delivered by Talend. For more information, check with your administrator.

OAuth token endpoint

Enter OAuth 2.0 token endpoint.

This option is available when OAuth 2.0 is selected from the Authentication Type drop-down list.

Client ID

Enter the client ID of your application.

This option is available when OAuth 2.0 is selected from the Authentication Type drop-down list.

Client Secret

Enter the client secret of your application.

This option is available when OAuth 2.0 is selected from the Authentication Type drop-down list.

Grant type

Set the grant type for retrieving the access token. Two options are provided: Client Credentials and Password.

Click Client Credentials and Resource Owner Password Credentials for related information.

This option is available when OAuth 2.0 is selected from the Authentication Type drop-down list.

Note: This option is available only if you have installed the R2020-10 Studio Monthly update or a later one delivered by Talend. For more information, check with your administrator.

OAuth username

Enter the OAuth username.

This option is available when Password is selected from the Grant type drop-down list.

OAuth password

Enter the OAuth password.

To enter the password, click the [...] button next to the password field, enter the password between double quotes in the pop-up dialog box, and then click OK to save the settings.

This option is available when Password is selected from the Grant type drop-down list.

Note: OAuth password does not support spaces.

Scope

Enter the scope. See Scopes for related information.

This option is available when OAuth 2.0 is selected from the Authentication Type drop-down list.

User Id and Password

Enter, in double quotation marks, your authentication information to log in to Snowflake.

  • In the User ID field, enter, in double quotation marks, your login name that has been defined in Snowflake using the LOGIN_NAME parameter of Snowflake. For details, ask the administrator of your Snowflake system.

  • To enter the password, click the [...] button next to the password field, and then in the pop-up dialog box enter the password between double quotes and click OK to save the settings.

This field is available only when Use this Component is selected from the Connection Component drop-down list.

Warehouse

Enter, in double quotation marks, the name of the Snowflake warehouse to be used. This name is case-sensitive and is normally upper case in Snowflake.

This field is available only when Use this Component is selected from the Connection Component drop-down list.

Schema

Enter, within double quotation marks, the name of the database schema to be used. This name is case-sensitive and is normally upper case in Snowflake.

This field is available only when Use this Component is selected from the Connection Component drop-down list.

Database

Enter, in double quotation marks, the name of the Snowflake database to be used. This name is case-sensitive and is normally upper case in Snowflake.

This field is available only when Use this Component is selected from the Connection Component drop-down list.

Table

Click the [...] button and in the displayed wizard, select the Snowflake table to be used.

To load the data into a new table, select Use custom object in the wizard and enter the name of the new table in Object Name field.

Schema and Edit Schema

A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. When you create a Spark Job, avoid the reserved word line when naming the fields.

Built-In: You create and store the schema locally for this component only.

Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs.

If the Snowflake data type to be handled is VARIANT, OBJECT or ARRAY, while defining the schema in the component, select String for the corresponding data in the Type column of the schema editor wizard.

Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this option to view the schema only.

  • Change to built-in property: choose this option to change the schema to Built-in for local changes.

  • Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the Repository Content window.

Note that if the input value of any non-nullable primitive field is null, the row of data including that field will be rejected.

This component offers the advantage of the dynamic schema feature. This allows you to retrieve unknown columns from source files or to copy batches of columns from a source without mapping each column individually. For further information about dynamic schemas, see Talend Studio User Guide.

This dynamic schema feature is designed for the purpose of retrieving unknown columns of a table and is recommended to be used for this purpose only; it is not recommended for the use of creating tables.

Table Action

Select the action to be carried out to the table.

  • NONE: Leave the table as is.
  • DROP_CREATE: Remove the table and create it again.
  • CREATE: Create a new table.
  • CREATE_IF_NOT_EXISTS: Create the table if it does not exist.
  • DROP_IF_EXISTS_AND_CREATE: Remove the table if it already exists and create again.
  • CLEAR: Remove all the data records in the table.
  • TRUNCATE: Remove all the rows in the table. This action releases the space occupied by the table.
Output Action

Select the operation you want to perform to the incoming data and data records in the Snowflake database table. You can insert, delete, update or merge data in the Snowflake table. This option assumes that the Snowflake table specified in Table field already exists.

  • INSERT: Insert new records in the Snowflake table.
    Note: Because this operation uses the Snowflake COPY INTO command, which drops the use of temporary tables and currently does not support data validation, unexpected results may occur when you retrieve rejected records. See Transforming Data During a Load for related information.
  • UPDATE: Update existing records in the Snowflake table.
  • UPSERT: Create new records and update existing records. You need to specify a schema column as the join key from the Upsert Key Column drop-down list or specify to use the schema keys for the operation by selecting Use schema keys for upsert in the Advanced settings view.
    Note: The Upsert Key Column drop-down list is available when Use schema keys for upsert is not selected.
  • DELETE: Remove records from the Snowflake table.
Storage Select the type of storage from which the data will be loaded to the table.
  • Internal: Load data from files stored in an internal Snowflake storage folder. You need also to specify the folder within double quotation marks in Stage Folder.
  • S3: Load data from files stored in a folder under an Amazon S3 bucket. You need also to provide information about your S3 user account, including Region, Access Key (in double quotation marks), Secret Key, Bucket (in double quotation marks), and Folder (in double quotation marks).
  • Azure: Load data from files stored in an Azure folder. You need also to provide information about your Azure user account, including Protocol, Account Name (in double quotation marks), Container (in double quotation marks), Folder (in double quotation marks), and SAS Token.
Stage Folder

Specify the Snowflake stage folder in the internal storage to load data from.

This field is available when Internal is selected from the Storage drop-down list in the Basic settings view and the Use Custom Storage Location option is not selected in the Advanced settings view.

Region Specify the region where the S3 bucket locates.

This field is available when you select S3 from the Storage drop-down list in the Basic settings view.

Access Key and Secret Key Enter the authentication information required to connect to the Amazon S3 bucket to be used.

To enter the password, click the [...] button next to the password field, and then in the pop-up dialog box enter the password between double quotes and click OK to save the settings.

This field is available when you select S3 from the Storage drop-down list in the Basic settings view.

Bucket Enter the name of the bucket to be used to load data. This bucket must already exist.

This field is available when you select S3 from the Storage drop-down list in the Basic settings view.

Folder Enter the folder (in double quotation marks) from which you want to load data.

This field is available when S3 or Azure is selected from the Storage drop-down list.

Protocol Select the protocol used to create Azure connection.

This field is available when you select Azure from the Storage drop-down list in the Basic settings view.

Account Name Enter the name (in double quotation marks) of the Azure storage account you need to access.

This field is available when you select Azure from the Storage drop-down list in the Basic settings view.

Container and Folder Specify the Azure container and folder (in double quotation marks) used for storing and managing data.

This field is available when you select Azure from the Storage drop-down list in the Basic settings view.

SAS Token Specify the SAS token to grant limited access to objects in your storage account.

To enter the SAS token, click the [...] button next to the SAS token field, and then in the pop-up dialog box enter the password between double quotes and click OK to save the settings.

This field is available when you select Azure from the Storage drop-down list in the Basic settings view.

Advanced settings

Additional JDBC Parameters

Specify additional connection properties for the database connection you are creating. The properties are separated by semicolon and each property is a key-value pair, for example, encryption=1;clientname=Talend.

This field is available only when you select Use this Component from the Connection Component drop-down list and select Internal from the Storage drop-down list in the Basic settings view.

Login Timeout

Specify the timeout period (in minutes) of Snowflake login attempts. An error will be generated if no response is received in this period.

Role

Enter, in double quotation marks, the default access control role to use to initiate the Snowflake session.

This role must already exist and has been granted to the user ID you are using to connect to Snowflake. If this field is left empty, the PUBLIC role is automatically granted. For information about Snowflake access control model, see Understanding the Access Control Model.

Region ID (Deprecated)

Enter a region ID in double quotation marks, for example eu-west-1 or east-us-2.azure. For information about Snowflake Region ID, see Supported Cloud Regions.

For Snowflake components other than tSnowflakeConnection, this field is available when you select Use This Component from the Connection Component drop-down list in the Basic settings view.

Note: This field is available only when you have installed the R2021-04 Studio Monthly update or a later one delivered by Talend. For more information, check with your administrator.
Allow Snowflake to convert columns and tables to uppercase

Select this check box to convert lowercase in the defined table name and schema column names to uppercase. Note that unquoted identifiers should match the Snowflake Identifier Syntax.

If you deselect the check box, all identifiers are automatically quoted.

This property is not available when you select the Manual Query check box.

For more information on the Snowflake Identifier Syntax, see Identifier Syntax.

Use schema keys for upsert

Select this option to use schema keys for the Upsert operation. This option is available when you select UPSERT from the Output Action drop-down list in the Basic settings view.

Note: This option is available only when you have installed the R2020-09 Studio Monthly update or a later one delivered by Talend. For more information, check with your administrator.
Temporary Table Schema Specifies an existing schema for the temporary table.
Custom DB Type Select this check box to specify the DB type for each column in the schema.

This property is available only when you select an action with Create Table from the Table Action drop down list in the Basic settings.

Delete Storage Files On Success Delete all the files in the storage folder once the data is loaded to the table successfully.

This field is not available when you select Use Custom Storage Location.

Snowflake access to storage

Specifies the authentication method for the COPY command when accessing the S3 bucket. See Additional Cloud Provider Parameters for related information.

Note:
  • This option is available when S3 is selected from the Storage drop-down list in the Basic settings view.
  • This option is available only when you have installed the R2021-10 Studio Monthly update or a later one delivered by Talend. For more information, check with your administrator.
  • Credentials, use the S3 credentials for the authentication.
  • Storage integration, use a storage integration for the authentication. In this case, enter the integration name in the Storage integration field.

S3 assume role

If you temporarily need some access permissions associated to an AWS IAM role that is not granted to your user account, select this check box to assume that role. Then specify the values for the following parameters to create a new assumed role session.

Ensure that access to this role has been granted to your user account by the trust policy associated to this role. If you are not certain about this, ask the owner of this role or your AWS administrator.

Note:
  • This option is available when S3 is selected from the Storage drop-down list in the Basic settings view.
  • This option is available only when you have installed the R2021-10 Studio Monthly update or a later one delivered by Talend. For more information, check with your administrator.
  • Role ARN: the Amazon Resource Name (ARN) of the role to assume. You can find this ARN name on the Summary page of the role to be used on your AWS portal, for example, this role ARN could read like am:aws:iam::[aws_account_number]:role/[role_name].

  • Role session name: enter the name you want to use to uniquely identify your assumed role session. This name can contain upper- and lower-case alphanumeric characters with no spaces. You can also include underscores or any of the following characters: =,.@-.

  • Session duration: the duration (in minutes) for which you want the assumed role session to be active. This duration cannot exceed the maximum duration which your AWS administrator has set. The duration defaults to 15 minutes.

For an example about an IAM role and its related policy types, see Create and Manage AWS IAM Roles from the AWS documentation.
S3 Max Error Retry

Specify the maximum data loading retries when an error occurs during loading data to or from the S3 folder. This parameter defaults to 3. A value of -1 specifies the maximum possible retries. Only -1 or positive integers are accepted.

This field is available when you select S3 from the Storage drop-down list in the Basic settings view.

Azure Max Error Retry

Specify the maximum data loading retries when an error occurs during loading data to or from the Azure folder. This parameter defaults to 3. A value of -1 specifies the maximum possible retries. Only -1 or positive integers are accepted.

This field is available when you select Azure from the Storage drop-down list in the Basic settings view.

Use Custom S3 Connection Configuration Select this check box if you wish to use your custom S3 configuration.

Option: select the parameter from the list.

Value: enter the parameter value.

This field is available when you select S3 from the Storage drop-down list in the Basic settings view.

Use Custom Stage Prefix

Select this check box to specify the path to the folder (with the current stage as the root) from which the data is loaded. You need also to enter the path to the folder in the field provided. For example, to load data stored in the files that are located in myfolder1/myfolder2 under the stage, you need to type "@~/myfolder1/myfolder2" in the field.

This field is available when you select Internal from the Storage drop-down list in the Basic settings view.

Once selected, the Stage Folder in Basic settings view becomes unavailable.

Use Custom Storage Location

Select this check box to specify a folder in an external storage (for example, S3) to load data from. You need to specify the folder in the field next to this option.

Copy Command Options Set parameters for the COPY INTO command by selecting the following options from the drop-down list. The COPY INTO command is provided by Snowflake. It loads data to a Snowflake database table.
  • Default: Carry out the COPY INTO operation using the default settings, as listed in the frame to the right.
  • Table: Set the COPY INTO operation parameters using the Options table. To set a parameter, click the plus button, select the parameter from the Option column, and set the parameter value in the Value column.
  • Manual: Set the COPY INTO operation parameters in the text frame to the right manually.
For information about the parameters of the COPY INTO command, see the COPY INTO command.

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the Job level as well as at each component level.

Global Variables

NB_LINE

The number of rows processed. This is an After variable and it returns an integer.

NB_SUCCESS

The number of rows successfully processed. This is an After variable and it returns an integer.

NB_REJECT

The number of rows rejected. This is an After variable and it returns an integer.

ERROR_MESSAGE

The error message generated by the component when an error occurs. This is an After variable and it returns a string.

Usage

Usage rule

This component can be used as a standalone component in a Job or a subJob.