HDFS properties - Cloud

Talend Cloud Apps Connectors Guide

Version
Cloud
Language
English
Product
Talend Cloud
Module
Talend Data Inventory
Talend Data Preparation
Talend Pipeline Designer
Content
Administration and Monitoring > Managing connections
Design and Development > Designing Pipelines
Last publication date
2024-03-21
Properties to configure to connect to a given Hadoop Distributed File System (HDFS).

HDFS connection

Property

Configuration

Selection Select or enter HDFS.
Configuration
Engine Select your engine in the list.
Connection User name Enter the user name used to authenticate to HDFS.
Description Enter a display name (mandatory) and a description (optional) for the connection.

HDFS dataset

Property Configuration
Dataset name Enter a display name for the dataset. This name will be used as a unique identifier of the dataset in all Talend Cloud apps.
Connection Select your connection in the list. If you are creating a dataset based on an existing connection, this field is read-only.
HDFS data Path Enter the path pointing to the data to be retrieved in the file system.
Format config Auto detect Click this button to automatically detect the format of the data to be retrieved.
Format Alternatively, select in the list the format of the file to be retrieved and enter or select the information related to this file format:
  • CSV:
    • Record delimiter: Select the type of record separator used in the file to be retrieved. If you select Other, you will be able to enter a custom record delimiter in the Custom record delimiter field.
    • Field delimiter: Select the type of field separator used in the file to be retrieved. If you select Other, you will be able to enter a custom record delimiter in the Custom field delimiter field.
    • Text enclosure character: Enter the character used to enclose the fields.
    • Escape character: Enter the character to be escaped in the records to be retrieved.
    • Encoding: Select the type of encoding used in the file to be retrieved. If you select Other, you will be able to enter a custom encoding type in the Custom encoding field.
    • Set header: Enable this option if the file to be retrieved contains header lines and enter or select the number of lines to be skipped in the schema.
  • Excel:
    • Excel format: Select the format/version corresponding to the file to be retrieved.
    • Sheet: Enter the name of the specific Excel sheet you want to be retrieved.
    • Set header/footer: enable these options if the file to be retrieved contains header and/or footer lines and enter or select the number of lines to be skipped in the schema.
  • Avro: No specific parameters required for this format.
  • Parquet: No specific parameters required for this format.
  • JSON: No specific parameters required for this format.
Additional parameters might be displayed depending on whether the connector is used as a source or destination dataset:
  • For HDFS source datasets:
    • Force parallelism—ignore escape char and text enclosure parameters: Enable this option if you want to ignore the escape characters and the characters used to enclose the text in your file.
  • For HDFS destination datasets:
    • Overwrite: Enable this option if the file already exists and you want to overwrite its content.
    • Merge output: Enable this option if the file already exists and you want to merge the existing and updated file content.
    • Map input column names to output: This option only applies to files with CSV, JSON, or Excel format. It ensures that the input and output field names are identical.