tAzureAdlsGen2Output Standard properties - 7.3

Azure Data Lake Store

Version
7.3
Language
English (United States)
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Cloud storages > Azure components > Azure Data Lake Store components
Data Quality and Preparation > Third-party systems > Cloud storages > Azure components > Azure Data Lake Store components
Design and Development > Third-party systems > Cloud storages > Azure components > Azure Data Lake Store components

These properties are used to configure tAzureAdlsGen2Output running in the Standard Job framework.

The Standard tAzureAdlsGen2Output component belongs to the Cloud family.

The component in this framework is available in all subscription-based Talend products with Big Data and Talend Data Fabric.

Basic settings

Property Type

Select the way the connection details will be set.

  • Built-In: The connection details will be set locally for this component. You need to specify the values for all related connection properties manually.

  • Repository: The connection details stored centrally in Repository > Metadata will be reused by this component. You need to click the [...] button next to it and in the pop-up Repository Content dialog box, select the connection details to be reused, and all related connection properties will be automatically filled in.

Schema and Edit schema

A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. When you create a Spark Job, avoid the reserved word line when naming the fields.

  • Built-In: You create and store the schema locally for this component only.

  • Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs.

Click Edit schema to make changes to the schema.
Note: If you make changes, the schema automatically becomes built-in.
  • View schema: choose this option to view the schema only.

  • Change to built-in property: choose this option to change the schema to Built-in for local changes.

  • Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the Repository Content window.

Sync colnmns

Click this button to retrieve the schema from the previous component connected in the Job.

Authentication method

Select one of the following authentication method from the drop-down list.

  • Shared key, which requires an account access key. See Manage a storage account for related information.
  • Shared Access Signature, which requires a shared access signature. See Constructing the Account SAS URI for related information.
  • Azure Active Directory, Select this option to use Azure Active Directory authentication when establishing the connection. See Azure AD Authentication for related information.
Note: The Azure Active Directory option is available only if you have installed the R2020-06 Studio Monthly update or a later one delivered by Talend. For more information, check with your administrator.

Account name

Enter the name of the Data Lake Storage account you need to access. Ensure that the administrator of the system has granted you the appropriate access permissions to this account.

Endpoint suffix

Enter the Azure Storage service endpoint.

The combination of the account name and the Azure Storage service endpoint forms the endpoint of the storage account.

Shared key

Enter the key associated with the storage account you need to access. Two keys are available for each account and by default, either of them can be used for this access. To know how to get your key, read Manage a storage account.

This field is available if you select Shared key from Authentication method drop-down list.

SAS token

Enter your account SAS token. You can get the SAS token for each allowed service on the Microsoft Azure portal after generating SAS. The SAS token format is https://<$storagename><$service>.core.windows.net/<$sastoken>, where <$storagename> is the storage account name, <$service> is the allowed service name (blob, file, queue or table), and <$sastoken> is the SAS token value. For more information, read Constructing the Account SAS URI.

This field is available if you select Shared access signature from Authentication method drop-down list.

Check connection

Click this button to validate the connection parameters provided.

Filesystem

Enter the name of the target Blob container.

You can also click the ... button to the right of this field and select the desired Blob container from the list in the dialog box.

Blobs Path

Enter the path to the target blobs.

Format

Set the format for the incoming data. Currently, the following formats are supported: CSV, AVRO, JSON, and Parquet.

Field Delimiter

Set the field delimiter. You can select Semicolon, Comma, Tabulation, and Space from the drop-down list; you can also select Other and enter your own in the Custom field delimiter field.

Record Separator

Set the record separator. You can select LF, CR, and CRLF from the drop-down list; you can also select Other and enter your own in the Custom Record Separator field.

Text Enclosure Character

Enter the character used to enclose text.

Escape character

Enter the character of the row to be escaped.

Header

Select this check box to insert a header row to the data. The schema column names will be used as column headers.

File Encoding

Select the file encoding from the drop-down list.

Advanced settings

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the Job level as well as at each component level.

Max batch size

Set the maximum number of lines allowed in each batch.

Do not change the default value unless you are facing performance issues. Increasing the batch size can improve the performance but a value too high could cause Job failures.

Blob Template Name

Enter a string as the name prefix for the Blob files generated. The name of a Blob file generated will be the name prefix followed by another string.

Global Variables

ERROR_MESSAGE

The error message generated by the component when an error occurs. This is an After variable and it returns a string.

NB_LINE

The number of rows successfully processed. This is an After variable and it returns an integer.

Usage

Usage rule

This component is usually used as an end component of a Job or subJob and it always needs an input link.