Loading the data from the local file - 6.5
HDFS
- EnrichVersion
- 6.5
- EnrichProdName
- Talend Big Data
- Talend Big Data Platform
- Talend Data Fabric
- Talend Open Studio for Big Data
- Talend Real-Time Big Data Platform
- EnrichPlatform
- Talend Studio
- task
- Data Governance > Third-party systems > File components (Integration) > HDFS components
- Data Quality and Preparation > Third-party systems > File components (Integration) > HDFS components
- Design and Development > Third-party systems > File components (Integration) > HDFS components
Procedure
-
Double-click tHDFSPut to define the
component in its Basic settings
view.
-
Select, for example, Apache 0.20.2 from the Hadoop
version list.
-
In the NameNode URI, the
Username and the Group fields, enter the connection parameters to
the HDFS. If you are using WebHDFS, the location should be
webhdfs://masternode:portnumber; if this WebHDFS is secured
with SSL, the scheme should be swebhdfs and you need to use
a tLibraryLoad in the Job to load the library required by
the secured WebHDFS.
-
Next to the Local directory field, click
the three-dot [...] button to browse to the
folder with the file to be loaded into the HDFS. In this scenario, the
directory has been specified while configuring tFileOutputDelimited:
C:/hadoopfiles/putFile/.
-
In the HDFS directory field, type in the
intended location in HDFS to store the file to be loaded. In this example,
it is /testFile.
-
Click the Overwrite file field to stretch
the drop-down.
-
From the menu, select always.
-
In the Files area, click the plus button
to add a row in which you define the file to be loaded.
-
In the File mask column, enter
*.txt to replace newLine
between quotation marks and leave the New
name column as it is. This allows you to extract all the
.txt files in the specified directory without
changing their names. In this example, the file is
in.txt.