tAzureSynapseBulkExec Standard properties - 7.3

Azure Synapse Analytics

Version
7.3
Language
English (United States)
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for ESB
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Cloud storages > Azure components > Azure Synapse Analytics components
Data Quality and Preparation > Third-party systems > Cloud storages > Azure components > Azure Synapse Analytics components
Design and Development > Third-party systems > Cloud storages > Azure components > Azure Synapse Analytics components

These properties are used to configure tAzureSynapseBulkExec running in the Standard Job framework.

The Standard tAzureSynapseBulkExec component belongs to two families: Cloud and Databases.

The component in this framework is available in all Talend products.

Basic settings

Property Type

Select the way the connection details will be set.

  • Built-In: The connection details will be set locally for this component. You need to specify the values for all related connection properties manually.

  • Repository: The connection details stored centrally in Repository > Metadata will be reused by this component. You need to click the [...] button next to it and in the pop-up Repository Content dialog box, select the connection details to be reused, and all related connection properties will be automatically filled in.

Use an existing connection

Select this check box and in the Component List drop-down list, select the desired connection component to reuse the connection details you already defined.

When a Job contains the parent Job and the child Job, if you need to share an existing connection between the two levels, for example, to share the connection created by the parent Job with the child Job, you have to:

  1. In the parent level, register the database connection to be shared in the Basic settings view of the connection component which creates that very database connection.

  2. In the child level, use a dedicated connection component to read that registered database connection.

For an example about how to share a database connection across Job levels, see Talend Studio User Guide.

JDBC Provider

Select the provider of the JDBC driver to be used.

Host

Enter the IP address or the hostname of the database server or the Azure Synapse Analytics to be used.

If the SQL Server Browser service is running on the machine where the server resides, you can connect to a named instance through a TCP dynamic port by providing the host name and the instance name in this field in the format of {host_name}\{instance_name}. In this case, you can leave the Port field empty. See SQL Server Browser service for related information.

Port

Enter the listening port number of the database server or the Azure Synapse Analytics to be used.

If the SQL Server Browser service is running on the machine where the server resides, you can connect to a named instance through a TCP dynamic port by providing the host name and the instance name in the Host field and leave this field empty. See SQL Server Browser service for related information.

Schema

Enter the name of the Azure Synapse Analytics schema.

Database

Specify the name of the Azure Synapse Analytics to be used.

Username and Password

Enter the authentication data.

To enter the password, click the [...] button next to the Password field, enter the password in double quotes in the pop-up dialog box, and then click OK.

You can use Type 2 integrated authentication on Windows by adding integratedSecurity=true in the Additional JDBC Parameters field and leave these two fields empty. See section Connecting with integrated authentication On Windows at Building the connection URL for related information.

Additional JDBC Parameters

Specify additional connection properties for the database connection you are creating. The properties are separated by semicolon and each property is a key-value pair. For example, encrypt=true;trustServerCertificate=false; hostNameInCertificate=*.database.windows.net;loginTimeout=30; for Azure SQL database connection.

Table

Specify the name of the SQL Data Warehouse table into which data will be loaded.

Action on table

Select an operation to be performed on the table defined.

  • None: No operation is carried out.

  • Drop and create table: The table is removed and created again.

  • Create table: The table does not exist and gets created.

  • Create table if not exists: The table is created if it does not exist.

  • Drop table if exists and create: The table is removed if it already exists and created again.

  • Clear table: The table content is deleted. You have the possibility to rollback the operation.

  • Truncate table: The table content is deleted. You do not have the possibility to rollback the operation.

Schema and Edit schema

A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. When you create a Spark Job, avoid the reserved word line when naming the fields.

  • Built-In: You create and store the schema locally for this component only.

  • Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs.

Click Edit schema to make changes to the schema.
Note: If you make changes, the schema automatically becomes built-in.
  • View schema: choose this option to view the schema only.

  • Change to built-in property: choose this option to change the schema to Built-in for local changes.

  • Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the Repository Content window.

Load method

Set the way for loading data. Two options are provided: Copy and PloyBase. See Data loading strategies for Synapse SQL pool for related information.

Note: This option is available only if you have installed the R2020-05 Studio Monthly update or a later one delivered by Talend. For more information, check with your administrator.
Azure Storage

Select the type of the Azure Storage from which data will be loaded. You can select Blob Storage, Data Lake Storage Gen1, or Data Lake Storage Gen2. Note that the Data Lake Storage Gen1 option is not available if Copy is selected from the Load method drop-down list.

Note: The Data Lake Storage Gen2 option is available only if you have installed the R2020-05 Studio Monthly update or a later one delivered by Talend. For more information, check with your administrator.

Authentication method

Select one of the following authentication method from the drop-down list.
  • Shared access signatures, which requires a shared access signature. See Constructing the Account SAS URI for related information.
  • Storage account key, which requires an account access key. See Manage a storage account for related information.
  • Azure Active Directory, available when Authenticate using Azure Active Directory is selected in the Advanced settings view.

This option is available when Copy is selected from the Load method drop-down list.

Note: This option is available only if you have installed the R2020-05 Studio Monthly update or a later one delivered by Talend. For more information, check with your administrator.
Account Name

Enter the account name for your Azure Blob Storage or Azure Data Lake Storage to be accessed.

Access key

Enter the key associated with the storage account you need to access. Two keys are available for each account and by default, either of them can be used for this access. Select the component whose connection details will be used to set up the connection to Azure storage from the drop-down list.

This property is available when Blob Storage is selected from the Azure Storage drop-down list.

SAS token

Enter the SAS token value. For related information, see Constructing the Account SAS URI. This option is only available to the Shared access signatures option of the Authentication method drop-down list.

Note: This option is available only if you have installed the R2020-05 Studio Monthly update or a later one delivered by Talend. For more information, check with your administrator.

Endpoint suffix

Enter the Azure Storage service endpoint.

The combination of the account name and the Azure Storage service endpoint forms the endpoint of the storage account.

The default value varies with the Azure Storage setting, that is, "dfs.core.windows.net" for Data Lake Storage Gen2 and "blob.core.windows.net" for Blob Storage.

This field is only available to the Copy option of the Load method drop-down list.

Note: This option is available only if you have installed the R2020-05 Studio Monthly update or a later one delivered by Talend. For more information, check with your administrator.
Container

Enter the name of the blob container.

This property is available when Blob Storage is selected from the Azure Storage drop-down list.

External paths

Enter external paths to load data from. You can add multiple external paths by clicking the plus button on the bottom. Note that the external paths are relative paths with the container as the root.

This field is only available when Copy is selected from the Load method drop-down list in the Basic settings view.

Note: This option is available only if you have installed the R2020-05 Studio Monthly update or a later one delivered by Talend. For more information, check with your administrator.
Client Id

Enter your application ID (also called client ID).

This field is available when Data Lake Storage Gen1 is selected from the Azure Storage drop-down list.

OAuth 2.0 token endpoint

In the Token endpoint field, copy-paste the OAuth 2.0 token endpoint that you can obtain from the Endpoints list accessible on the App registrations page on your Azure portal.

This field is available when Data Lake Storage Gen1 is selected from the Azure Storage drop-down list.

Azure Storage Location

Specify the location where your Azure Blob Storage or Azure Data Lake Storage account is created.

Advanced settings

Secure transfer required

Select this option to use secure transfer for Azure storage connection.

This option is available when Data Lake Storage Gen2 is selected from the Azure Storage drop-down list and PolyBase is selected from the Load method drop-down list in the Basic settings view.

Note: This option is available only if you have installed the R2020-05 Studio Monthly update or a later one delivered by Talend. For more information, check with your administrator.

Authenticate using Azure Active Directory

Select this option to use Azure Active Directory authentication when establishing the connection. See Azure AD Authentication for related information.

Note: This option is available only if you have installed the R2020-05 Studio Monthly update or a later one delivered by Talend. For more information, check with your administrator.
File format

Select the file format that defines external data stored in your Azure Blob Storage or Azure Data Lake Storage, Delimited Text, Hive RCFile, Hive ORC, or Parquet.

This option is available when PolyBase is selected from the Load method drop-down list in the Basic settings view.

For more information about the file format, see CREATE EXTERNAL FILE FORMAT.

File type

Select the file type that defines external data stored in your Azure Blob Storage or Azure Data Lake Storage, CSV, ORC, or Parquet.

This field is only available when Copy is selected from the Load method drop-down list in the Basic settings view.

For more information about the file format, see CREATE EXTERNAL FILE FORMAT.

Note: This option is available only if you have installed the R2020-05 Studio Monthly update or a later one delivered by Talend. For more information, check with your administrator.

Specify map to source table fields

Select this option to map the fields of the source file to specific schema columns by entering schema columns and field indexes in the table that appears. Note that the field indexes of the source file need to be given in an ascendant order in the table.

If this option is not selected, the fields of the source file will be mapped to the schema columns in the default order.

This field is only available when Copy is selected from the Load method drop-down list in the Basic settings view.

Note: This option is available only if you have installed the R2020-05 Studio Monthly update or a later one delivered by Talend. For more information, check with your administrator.

First row

Set the number of rows in the source file to be treated as the header. The default is 1.

This field is only available when Copy is selected from the Load method drop-down list in the Basic settings view and CSV option is selected from the File type drop-down list.

Note: This option is available only if you have installed the R2020-05 Studio Monthly update or a later one delivered by Talend. For more information, check with your administrator.
Field quote

Specify the character that will be used as the quote character (string delimiter) in the source file.

This field is only available when Copy is selected from the Load method drop-down list in the Basic settings view and CSV option is selected from the File type drop-down list.

Note: This option is available only if you have installed the R2020-05 Studio Monthly update or a later one delivered by Talend. For more information, check with your administrator.
Field terminator

Specify the field terminator, which indicates the end of each field in the source file.

This field is only available when Copy is selected from the Load method drop-down list in the Basic settings view and CSV option is selected from the File type drop-down list.

Note: This option is available only if you have installed the R2020-05 Studio Monthly update or a later one delivered by Talend. For more information, check with your administrator.
Row terminator

Specify the row terminator, which indicates the end of each row in the source file.

This field is only available when Copy is selected from the Load method drop-down list in the Basic settings view and CSV option is selected from the File type drop-down list.

Note: This option is available only if you have installed the R2020-05 Studio Monthly update or a later one delivered by Talend. For more information, check with your administrator.
Date format

Specify the date format. The default is Session date format.

This field is only available when Copy is selected from the Load method drop-down list in the Basic settings view and CSV option is selected from the File type drop-down list.

Note: This option is available only if you have installed the R2020-05 Studio Monthly update or a later one delivered by Talend. For more information, check with your administrator.
Encoding

Select the encoding from the list, either UTF8 or UTF16.

This field is only available when Copy is selected from the Load method drop-down list in the Basic settings view and CSV option is selected from the File type drop-down list.

Note: This option is available only if you have installed the R2020-05 Studio Monthly update or a later one delivered by Talend. For more information, check with your administrator.

Identity insert

Select this option to insert the identity values in the source file to the identity column. For related information, see COPY (Transact-SQL).

This field is only available when Copy is selected from the Load method drop-down list in the Basic settings view.

Note: This option is available only if you have installed the R2020-05 Studio Monthly update or a later one delivered by Talend. For more information, check with your administrator.

Max errors

Enter the maximum number of errors allowed that will not stop the process.

This field is only available when Copy is selected from the Load method drop-down list in the Basic settings view.

Field separator

Specify the character(s) that indicate the end of each field in the delimited text file.

This property is available when PolyBase is selected from the Load method drop-down list in the Basic settings view and Delimited Text is selected from the File format drop-down list.

Enclosed by

Select this check box and in the field next to it, specify the character that encloses the string in the delimited file.

This property is available when PolyBase is selected from the Load method drop-down list in the Basic settings view and Delimited Text is selected from the File format drop-down list.

Date format

Select this check box and in the field next to it, specify the custom format for all date and time data in the delimited file. For more information about the date format, see CREATE EXTERNAL FILE FORMAT.

This property is available when PolyBase is selected from the Load method drop-down list in the Basic settings view and Delimited Text is selected from the File format drop-down list.

Use type default

Select this check box to store each missing value using the default value of the data type of the corresponding column.

Clear this check box to store each missing value in the delimited file as NULL.

This property is available when PolyBase is selected from the Load method drop-down list in the Basic settings view and Delimited Text is selected from the File format drop-down list.

Serde Method

Select a Hive serializer and deserializer method.

This property is available when Hive RCFile is selected from the File format drop-down list.

Compressed by

Select this check box if external data is compressed, and from the drop-down list displayed next to it, select the compression method.

Data import reject options

Select this check box to specify the following reject options.

  • Reject type: Specify how you want to deal with reject rows.

    • Value: If the number of rejected rows exceeds the value specified in the Reject value field, the load fails.
    • Percentage: If the percentage of rejected rows exceeds the value specified in the Reject value field, the load fails.
  • Reject value: The reject value according to the reject type. For percentage, it is the percent value without the symbol %.

  • Reject sample value: The reject percentage sample value.

This option is only available to the PolyBase option of the Load method drop-down list.

For more information about the reject options, see CREATE EXTERNAL TABLE.

Distribution Option

Select the sharding pattern used to distribute data in the table, Round Robin, Hash, or Replicate. For more information about the sharding pattern supported by Azure Synapse Analytics, see Azure Synapse Analytics - Massively parallel processing (MPP) architecture.

This property is available when any option related to table creation is selected from the Action on table drop-down list.

Distribution Column Name

The name of the distribution column for a hash-distribution table.

This property is available when Hash is selected from the Distribution Option drop-down list.

Table Option

Select the index type of the table, Clustered Columnstore Index, Heap, or Clustered Index. For more information, see Indexing tables in SQL Data Warehouse.

This property is available when any option related to table creation is selected from the Action on table drop-down list.

Index column(s)

Specify the name of one or more key columns in the index. If multiple columns are specified, separate them with comma.

This property is available when Clustered Index is selected from the Table Option drop-down list.

Partition

Select this check box to specify the following partition options:

  • Partition column name: Specify the name of the column used to partition the table.

  • Range: Specify how the limit value is included in the range of the limit.

    • Left: The limit value is included in the left range of the limit.

    • Right: The limit value is included in the right range of the limit.

  • Partition For Values: Specify the values (separated by comma) used for partition.

For more information about the table partition, see Partitioning tables in SQL Data Warehouse.

This property is available when any option related to table creation is selected from the Action on table drop-down list.

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE

The error message generated by the component when an error occurs. This is an After variable and it returns a string.

NB_LINE_INSERTED

The number of rows inserted. This is an After variable and it returns an integer.

Usage

Usage rule

This component can be used as a standalone component of a Job or subJob.

Limitation

Note that some features that are supported by other databases are not supported by Azure Synapse Analytics. For more information, see Unsupported table features.