tDBFSPut Standard properties - Cloud - 8.0

Databricks

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development > Designing Jobs > Hadoop distributions > Databricks
Design and Development > Designing Jobs > Serverless > Databricks
Last publication date
2024-02-20

These properties are used to configure tDBFSPut running in the Standard Job framework.

The Standard tDBFSPut component belongs to the Big Data and the File families.

The component in this framework is available in all Talend products with Big Data and in Talend Data Fabric.

Basic settings

Property type

Either Built-In or Repository.

Built-In: No property data stored centrally.

Repository: Select the repository file where the properties are stored.

Use an existing connection

Select this check box and in the Component List click the HDFS connection component from which you want to reuse the connection details already defined.

Note that when a Job contains the parent Job and the child Job, Component List presents only the connection components in the same Job level.

Endpoint

In the Endpoint field, enter the URL address of your Azure Databricks workspace. This URL can be found in the Overview blade of your Databricks workspace page on your Azure portal. For example, this URL could look like https://adb-$workspaceId.$random.azuredatabricks.net.

Token

Click the [...] button next to the Token field to enter the authentication token generated for your Databricks user account. You can generate or find this token on the User settings page of your Databricks workspace. For further information, see Personal access tokens from the official Azure documentation.

DBFS directory

In the DBFS directory field, enter the path pointing to the data to be used in the DBFS file system.

Local directory

Local directory where are stored the files to be loaded into DBFS.

Overwrite file

Options to overwrite or not the existing file with the new one.

Include subdirectories

Select this check box if the selected input source type includes sub-directories.

Files

In the Files area, the fields to be completed are:

- File mask: type in the file name to be selected from the local directory. Regular expression is available.

- New name: give a new name to the loaded file.

Die on error

Select the check box to stop the execution of the Job when an error occurs.

Clear the check box to skip any rows on error and complete the process for error-free rows.

Advanced settings

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the Job level as well as at each component level.

Usage

Usage rule

This component combines DBFS connection and data extraction, thus usually used as a single-component subJob to copy data from a user-defined local directory to DBFS.

It runs standalone and does not generate input or output flow for the other components. It is often connected to the Job using OnSubjobOk or OnComponentOk link, depending on the context.