HDFS properties

HDFS properties - Cloud

Talend Cloud Apps Connectors Guide

Version

Cloud

Language

English

Product

Talend Cloud

Module

Talend Data Inventory

Talend Data Preparation

Talend Pipeline Designer

Content

Administration and Monitoring > Managing connections

Design and Development > Designing Pipelines

Last publication date

2024-03-21

Properties to configure to connect to a given Hadoop Distributed File System (HDFS).

HDFS connection

Property		Configuration
Selection		Select or enter HDFS.
Configuration
Engine		Select your engine in the list.
Connection	User name	Enter the user name used to authenticate to HDFS.
Description		Enter a display name (mandatory) and a description (optional) for the connection.

HDFS dataset

Property		Configuration
Dataset name		Enter a display name for the dataset. This name will be used as a unique identifier of the dataset in all Talend Cloud apps.
Connection		Select your connection in the list. If you are creating a dataset based on an existing connection, this field is read-only.
HDFS data	Path	Enter the path pointing to the data to be retrieved in the file system.
Format config	Auto detect	Click this button to automatically detect the format of the data to be retrieved.
Format config	Format	Alternatively, select in the list the format of the file to be retrieved and enter or select the information related to this file format: CSV: Record delimiter: Select the type of record separator used in the file to be retrieved. If you select Other, you will be able to enter a custom record delimiter in the Custom record delimiter field. Field delimiter: Select the type of field separator used in the file to be retrieved. If you select Other, you will be able to enter a custom record delimiter in the Custom field delimiter field. Text enclosure character: Enter the character used to enclose the fields. Escape character: Enter the character to be escaped in the records to be retrieved. Encoding: Select the type of encoding used in the file to be retrieved. If you select Other, you will be able to enter a custom encoding type in the Custom encoding field. Set header: Enable this option if the file to be retrieved contains header lines and enter or select the number of lines to be skipped in the schema. Excel: Excel format: Select the format/version corresponding to the file to be retrieved. Sheet: Enter the name of the specific Excel sheet you want to be retrieved. Set header/footer: enable these options if the file to be retrieved contains header and/or footer lines and enter or select the number of lines to be skipped in the schema. Avro: No specific parameters required for this format. Parquet: No specific parameters required for this format. JSON: No specific parameters required for this format.

Additional parameters might be displayed depending on whether the connector is used as a source or destination dataset:

For HDFS source datasets:
- Force parallelism—ignore escape char and text enclosure parameters: Enable this option if you want to ignore the escape characters and the characters used to enclose the text in your file.
For HDFS destination datasets:
- Overwrite: Enable this option if the file already exists and you want to overwrite its content.
- Merge output: Enable this option if the file already exists and you want to merge the existing and updated file content.
- Map input column names to output: This option only applies to files with CSV, JSON, or Excel format. It ensures that the input and output field names are identical.