tSQLDWHBulkExec Standard properties - 7.0

Azure SQL Data Warehouse

EnrichVersion
7.0
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
EnrichPlatform
Talend Studio
task
Data Governance > Third-party systems > Cloud storages > Azure components > Azure SQL Data Warehouse components
Data Quality and Preparation > Third-party systems > Cloud storages > Azure components > Azure SQL Data Warehouse components
Design and Development > Third-party systems > Cloud storages > Azure components > Azure SQL Data Warehouse components

These properties are used to configure tSQLDWHBulkExec running in the Standard Job framework.

The Standard tSQLDWHBulkExec component belongs to two families: Cloud and Databases.

The component in this framework is available in all Talend products.

Basic settings

Property Type

Select the way the connection details will be set.

  • Built-In: The connection details will be set locally for this component. You need to specify the values for all related connection properties manually.

  • Repository: The connection details stored centrally in Repository > Metadata will be reused by this component. You need to click the [...] button next to it and in the pop-up Repository Content dialog box, select the connection details to be reused, and all related connection properties will be automatically filled in.

Use an existing connection

Select this check box and in the Component List click the relevant connection component to reuse the connection details you already defined.

When a Job contains the parent Job and the child Job, if you need to share an existing connection between the two levels, for example, to share the connection created by the parent Job with the child Job, you have to:

  1. In the parent level, register the database connection to be shared in the Basic settings view of the connection component which creates that very database connection.

  2. In the child level, use a dedicated connection component to read that registered database connection.

For an example about how to share a database connection across Job levels, see Talend Studio User Guide.

JDBC Provider

Select the provider of the JDBC driver to be used.

Host

Specify the IP address or hostname of the Azure SQL Data Warehouse to be used.

Port

Specify the listening port number of the Azure SQL Data Warehouse to be used.

Schema

Enter the name of the Azure SQL Data Warehouse schema.

Database

Specify the name of the Azure SQL Data Warehouse to be used.

Username and Password

Enter the user authentication data to access the Azure SQL Data Warehouse.

To enter the password, click the [...] button next to the password field, and then in the pop-up dialog box enter the password between double quotes and click OK to save the settings.

Additional JDBC Parameters

Specify additional connection properties for the database connection you are creating. The properties are separated by semicolon and each property is a key-value pair. For example, encrypt=true;trustServerCertificate=false; hostNameInCertificate=*.database.windows.net;loginTimeout=30; for Azure SQL database connection.

Table

Specify the name of the SQL Data Warehouse table into which data will be loaded.

Action on table

Select an operation to be performed on the table defined.

  • None: No operation is carried out.

  • Drop and create table: The table is removed and created again.

  • Create table: The table does not exist and gets created.

  • Create table if not exists: The table is created if it does not exist.

  • Drop table if exists and create: The table is removed if it already exists and created again.

  • Clear table: The table content is deleted. You have the possibility to rollback the operation.

  • Truncate table: The table content is deleted. You do not have the possibility to rollback the operation.

Schema and Edit schema

A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. When you create a Spark Job, avoid the reserved word line when naming the fields.

  • Built-In: You create and store the schema locally for this component only.

  • Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs.

Click Edit schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in.

  • View schema: choose this option to view the schema only.

  • Change to built-in property: choose this option to change the schema to Built-in for local changes.

  • Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the [Repository Content] window.

Azure Storage

Select the type of the Azure Storage from which data will be loaded, either Blob Storage or Data Lake Store.

Account Name

Enter the account name for your Azure Blob Storage or Azure Data Lake Store to be accessed.

Access key

Enter the key associated with the storage account you need to access. Two keys are available for each account and by default, either of them can be used for this access.

This property is available when Blob Storage is selected from the Azure Storage drop-down list.

Container

Enter the name of the blob container.

This property is available when Blob Storage is selected from the Azure Storage drop-down list.

Authentication key

Enter the authentication key needed to access your Azure Data Lake Store.

This property is available when Data Lake Store is selected from the Azure Storage drop-down list.

Client Id

Enter your application ID (also called client ID).

This property is available when Data Lake Store is selected from the Azure Storage drop-down list.

OAuth 2.0 token endpoint

Copy-paste the OAuth 2.0 token endpoint that you can obtain from the Endpoints list accessible on the App registrations page.

This property is available when Data Lake Store is selected from the Azure Storage drop-down list.

Azure Storage Location

Specify the location where your Azure Blob Storage or Azure Data Lake Store account is created.

Advanced settings

File format

Select the file format that defines external data stored in your Azure Blob Storage or Azure Data Lake Store, Delimited Text, Hive RCFile, Hive ORC, or Parquet.

For more information about the file formats, see CREATE EXTERNAL FILE FORMAT.

Field separator

Specify the character(s) that indicate the end of each field in the delimited text file.

This property is available when Delimited Text is selected from the File format drop-down list.

Enclosed by

Select this check box and in the field next to it, specify the character that encloses the string in the delimited file.

This property is available when Delimited Text is selected from the File format drop-down list.

Date format

Select this check box and in the field next to it, specify the custom format for all date and time data in the delimited file. For more information about the date format, see CREATE EXTERNAL FILE FORMAT.

This property is available when Delimited Text is selected from the File format drop-down list.

Use type default

Select this check box to store each missing value using the default value of the data type of the corresponding column.

Clear this check box to store each missing value in the delimited file as NULL.

This property is available when Delimited Text is selected from the File format drop-down list.

Serde Method

Select a Hive serializer and deserializer method.

This property is available when Hive RCFile is selected from the File format drop-down list.

Compressed by

Select this check box if external data is compressed, and from the drop-down list displayed next to it, select the compression method.

Data import reject options

Select this check box to specify the following reject options.

  • Reject type: Specify how you want to deal with reject rows.

    • Value: If the number of rejected rows exceeds the value specified in the Reject value field, the load fails.
    • Percentage: If the percentage of rejected rows exceeds the value specified in the Reject value field, the load fails.
  • Reject value: The reject value according to the reject type. For percentage, it is the percent value without the symbol %.

  • Reject sample value: The reject percentage sample value.

For more information about the reject options, see CREATE EXTERNAL TABLE.

Distribution Option

Select the sharding pattern used to distribute data in the table, Round Robin, Hash, or Replicate. For more information about the sharding pattern supported by Azure SQL Data Warehouse, see Azure SQL Data Warehouse - Massively parallel processing (MPP) architecture.

This property is available when any option related to table creation is selected from the Action on table drop-down list.

Distribution Column Name

The name of the distribution column for a hash-distribution table.

This property is available when Hash is selected from the Distribution Option drop-down list.

Table Option

Select the index type of the table, Clustered Columnstore Index, Heap, or Clustered Index. For more information, see Indexing tables in SQL Data Warehouse.

This property is available when any option related to table creation is selected from the Action on table drop-down list.

Index column(s)

Specify the name of one or more key columns in the index. If multiple columns are specified, separate them with comma.

This property is available when Clustered Index is selected from the Table Option drop-down list.

Partition

Select this check box to specify the following partition options:

  • Partition column name: Specify the name of the column used to partition the table.

  • Range: Specify how the limit value is included in the range of the limit.

    • Left: The limit value is included in the left range of the limit.

    • Right: The limit value is included in the right range of the limit.

  • Partition For Values: Specify the values (separated by comma) used for partition.

For more information about the table partition, see Partitioning tables in SQL Data Warehouse.

This property is available when any option related to table creation is selected from the Action on table drop-down list.

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE

The error message generated by the component when an error occurs. This is an After variable and it returns a string.

NB_LINE_INSERTED

The number of rows inserted. This is an After variable and it returns an integer.

Usage

Usage rule

This component can be used as a standalone component of a Job or Subjob.

Limitation

Note that some features that are supported by other databases are not supported by Azure SQL Data Warehouse. For more information, see Unsupported table features.