tSnowflakeOutputBulk Standard properties - 7.1

Snowflake

Version
7.1
Language
English (United States)
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Cloud storages > Snowflake components
Data Quality and Preparation > Third-party systems > Cloud storages > Snowflake components
Design and Development > Third-party systems > Cloud storages > Snowflake components

These properties are used to configure tSnowflakeOutputBulk running in the Standard Job framework.

The Standard tSnowflakeOutputBulk component belongs to the Cloud family.

The component in this framework is available in all subscription-based Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related to database settings vary depending on your database type selection. For more information about dynamic database connectors, see Dynamic database components.

Basic settings

Database

Select a type of database from the list and click Apply.

Property Type

Select the way the connection details will be set.

  • Built-In: The connection details will be set locally for this component. You need to specify the values for all related connection properties manually.

  • Repository: The connection details stored centrally in Repository > Metadata will be reused by this component. You need to click the [...] button next to it and in the pop-up Repository Content dialog box, select the connection details to be reused, and all related connection properties will be automatically filled in.

This property is not available when other connection component is selected from the Connection Component drop-down list.

Connection Component

Select the component that opens the database connection to be reused by this component.

Account

In the Account field, enter, within double quotation marks, the name that has been assigned to you by Snowflake.

This field appears when you select Internal from the Storage drop-down list.

Snowflake Region

Select an AWS region or an Azure region from the Snowflake Region drop-down list.

This field appears when you select Internal from the Storage drop-down list.

User Id and Password

Enter, within double quotation marks, your authentication information to log in Snowflake.

  • In the User ID field, enter, within double quotation marks, your login name that has been defined in Snowflake using the LOGIN_NAME parameter of Snowflake. For details, ask the administrator of your Snowflake system.

  • To enter the password, click the [...] button next to the password field, and then in the pop-up dialog box enter the password between double quotes and click OK to save the settings.

This field appears when you select Internal from the Storage drop-down list.

Warehouse

Enter, within double quotation marks, the name of the Snowflake warehouse to be used. This name is case-sensitive and is normally upper case in Snowflake.

This field is available only when Internal is selected from the Storage drop-down list.

Schema

Enter, within double quotation marks, the name of the database schema to be used. This name is case-sensitive and is normally upper case in Snowflake.

This field appears when you select Internal from the Storage drop-down list.

Database

Enter, within double quotation marks, the name of the Snowflake database to be used. This name is case-sensitive and is normally upper case in Snowflake.

This field appears when you select Internal from the Storage drop-down list.

Schema and Edit Schema

A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. When you create a Spark Job, avoid the reserved word line when naming the fields.

Built-In: You create and store the schema locally for this component only.

Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs.

If the Snowflake data type to be handled is VARIANT, OBJECT or ARRAY, while defining the schema in the component, select String for the corresponding data in the Type column of the schema editor wizard.

Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this option to view the schema only.

  • Change to built-in property: choose this option to change the schema to Built-in for local changes.

  • Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the Repository Content window.

Note that if the input value of any non-nullable primitive field is null, the row of data including that field will be rejected.

This component offers the advantage of the dynamic schema feature. This allows you to retrieve unknown columns from source files or to copy batches of columns from a source without mapping each column individually. For further information about dynamic schemas, see Talend Studio User Guide.

This dynamic schema feature is designed for the purpose of retrieving unknown columns of a table and is recommended to be used for this purpose only; it is not recommended for the use of creating tables.

Storage Select the type of storage into which data will be loaded.
  • Internal: Select an internal storage to load data. User needs to specify credentials in order to connect to the default Snowflake storage specified in Stage Folder.
  • S3: Select an Amazon S3 bucket to load data.
  • Azure: Select an Azure container to load data.
Stage Folder Specify the path to the default Snowflake storage.

This field appears when you select Internal from the Storage drop-down list.

Region Specify the region where the S3 bucket locates.

This field appears when you select S3 from the Storage drop-down list.

Access Key and Secret Key Enter the authentication information required to connect to the Amazon S3 bucket to be used.

To enter the password, click the [...] button next to the password field, and then in the pop-up dialog box enter the password between double quotes and click OK to save the settings.

This field appears when you select S3 from the Storage drop-down list.

Bucket Enter the name of the bucket to be used for storing data. This bucket must already exist.

This field appears when you select S3 from the Storage drop-down list.

Folder Enter the directory in which you want to store data. This directory is created if it does not exist at runtime.

This property is available only when S3 or Azure is selected from the Storage drop-down list.

Server-Side Encryption This property is checked by default. Select this check box to encrypt the files to be uploaded to the S3 bucket on the server side.

This field appears when you select S3 from the Storage drop-down list.

Protocol Select the protocol used to create an Azure connection.

This field appears when you select Azure from the Storage drop-down list.

Account Name Enter the name of the Azure storage account you need to access.

This field appears when you select Azure from the Storage drop-down list.

Container Specify the Azure container used for storing data.

This field appears when you select Azure from the Storage drop-down list.

SAS Token Specify the SAS token to grant limited access to objects in your storage account.

To enter the SAS token, click the [...] button next to the SAS token field, and then in the pop-up dialog box enter the password between double quotes and click OK to save the settings.

This field appears when you select Azure from the Storage drop-down list.

Advanced settings

Use Custom Snowflake Region
Select this check box to use the customized Snowflake region.
  • Region ID: enter, within double quotation marks, a region ID, for example eu-west-1 or east-us-2.azure.

For more information on Snowflake Region ID, see https://docs.snowflake.net/manuals/user-guide/intro-regions.html#region-ids-in-account-urls.

Login Timeout

Specify how long to wait for a response when connecting to Snowflake before returning an error.

Tracing

Select the log level for the Snowflake JDBC driver. If enabled, a standard Java log is generated.

Role

Enter, within double quotation marks, the default access control role to use to initiate the Snowflake session.

This role must already exist and has been granted to the user ID you are using to connect to Snowflake. If this field is left empty, the PUBLIC role is automatically granted. For further information about the Snowflake access control model, see Snowflake documentation at Understanding the Access Control Model.

Use Custom Stage Path

Select this check box to use the specified storage to upload data.

This field is available only when Internal is selected from the Storage drop-down list in the Basic Settings. Once selected, the Stage Folder in Basic settings is disabled.

Put Command Options

Select a PUT command mode from the drop-down list:.
  • Default: By default, only read-only string properties with options are used.
  • Table: A table with two columns, Option and Value, are used to configure the command options.
  • Manual: You can set any custom options using a string property.

This field appears when you select Internal from the Storage drop-down list.

Put Command Error Retry

Specify the retry times when there is an error during data uploading with the Put command.

S3 Max Error Retry

Specify the maximum retry times when an error occurs during loading data to the S3 bucket.

This field is available only when S3 is selected from the Storage drop-down list in the basic settings.

Azure Max Error Retry

Specify the maximum retry times when an error occurs during loading data to the Azure container.

This field is available only when Azure is selected from the Storage drop-down list in the basic settings.

Stop on non-empty Storage Folder Select this check box to stop the job when the folder used for storing data is not empty.
Chunk Size (bytes) Specify the size of a CSV file, which is 52428800 bytes by default.

Since records must be fully stored inside a single CSV file, the size of the CSV file can be larger than the specified value, but no more than the size of the last record stored in the file.

Number of file requests threads Specify the number of threads used for sending Put requests in parallel when splitting data in a file by the size specified in the Chunk Size property.

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the Job level as well as at each component level.

Global Variables

NB_LINE

The number of rows processed. This is an After variable and it returns an integer.

NB_SUCCESS

The number of rows successfully processed. This is an After variable and it returns an integer.

NB_REJECT

The number of rows rejected. This is an After variable and it returns an integer.

ERROR_MESSAGE

The error message generated by the component when an error occurs. This is an After variable and it returns a string.

Usage

Usage rule

This component is an end component of a data flow in your Job. It receives data from other components through the Row > Main link.