Properties to configure to connect to a given Hadoop Distributed File
System (HDFS).
HDFS connection
Property |
Configuration |
|
---|---|---|
Selection | Select or enter HDFS. | |
Configuration | ||
Engine | Select your engine in the list. | |
Connection | User name | Enter the user name used to authenticate to HDFS. |
Description | Enter a display name (mandatory) and a description (optional) for the connection. |
HDFS dataset
Property | Configuration | |
---|---|---|
Dataset name | Enter a display name for the dataset. This name will be used as a unique identifier of the dataset in all Talend Cloud apps. | |
Connection | Select your connection in the list. If you are creating a dataset based on an existing connection, this field is read-only. | |
HDFS data | Path | Enter the path pointing to the data to be retrieved in the file system. |
Format config | Auto detect | Click this button to automatically detect the format of the data to be retrieved. |
Format | Alternatively, select in the list the format of the file to be retrieved
and enter or select the information related to this file format:
|
Additional parameters might be displayed depending on whether the connector is used as a
source or destination dataset:
- For HDFS source datasets:
- Force parallelism—ignore escape char and text enclosure parameters: Enable this option if you want to ignore the escape characters and the characters used to enclose the text in your file.
- For HDFS destination datasets:
- Overwrite: Enable this option if the file already exists and you want to overwrite its content.
- Merge output: Enable this option if the file already exists and you want to merge the existing and updated file content.
- Map input column names to output: This option only applies to files with CSV, JSON, or Excel format. It ensures that the input and output field names are identical.