tDBFSGet Standard properties - 7.3

Databricks

EnrichVersion
Cloud
7.3
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Open Studio for Big Data
Talend Real-Time Big Data Platform
EnrichPlatform
Talend Studio
task
Design and Development > Designing Jobs > Hadoop distributions > Databricks
Design and Development > Designing Jobs > Serverless > Databricks

These properties are used to configure tDBFSGet running in the Standard Job framework.

The Standard tDBFSGet component belongs to the Big Data and the File families.

The component in this framework is available in all Talend products with Big Data and in Talend Data Fabric.

Basic settings

Property type

Either Built-In or Repository.

Built-In: No property data stored centrally.

Repository: Select the repository file where the properties are stored.

Use an existing connection

Select this check box and in the Component List click the HDFS connection component from which you want to reuse the connection details already defined.

Note that when a Job contains the parent Job and the child Job, Component List presents only the connection components in the same Job level.

Endpoint

In the Endpoint field, enter the URL address of your Azure Databricks workspace. This URL can be found in the Overview blade of your Databricks workspace page on your Azure portal. For example, this URL could look like https://adb-$workspaceId.$random.azuredatabricks.net.

Token

Click the [...] button next to the Token field to enter the authentication token generated for your Databricks user account. You can generate or find this token on the User settings page of your Databricks workspace. For further information, see Personal access tokens from the Azure documentation.

DBFS directory

In the DBFS directory field, enter the path pointing to the data to be used in the DBFS file system.

Local directory

Browse to, or enter the local directory to store the files copied from DBFS.

Overwrite file

Options to overwrite or not the existing file with the new one.

Include subdirectories

Select this check box if the selected input source type includes sub-directories.

Files

In the Files area, the fields to be completed are:

- File mask: type in the file name to be selected from HDFS. Regular expression is available.

- New name: give a new name to the obtained file.

Die on error

Select the check box to stop the execution of the Job when an error occurs.

Clear the check box to skip any rows on error and complete the process for error-free rows.

Advanced settings

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the Job level as well as at each component level.

Usage

Usage rule

This component combines DBFS connection and data extraction, thus used as a single-component subJob to copy data from DBFS to an user-defined local directory.

It runs standalone and does not generate input or output flow for the other components. It is often connected to the Job using OnSubjobOk or OnComponentOk link, depending on the context.