Configuring the connection to the file system to be used by Spark - Cloud - 8.0

Data mapping

Version
Cloud
8.0
Language
English
Product
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Processing components (Integration) > Data mapping
Data Quality and Preparation > Third-party systems > Processing components (Integration) > Data mapping
Design and Development > Third-party systems > Processing components (Integration) > Data mapping
Last publication date
2024-02-29

Skip this section if you are using Google Dataproc or HDInsight, as for these two distributions, this connection is configured in the Spark configuration tab.

Procedure

  1. Double-click tHDFSConfiguration to open its Component view.

    Spark uses this component to connect to the HDFS system to which the jar files dependent on the Job are transferred.

  2. If you have defined the HDFS connection metadata under the Hadoop cluster node in Repository, select Repository from the Property type drop-down list and then click the [...] button to select the HDFS connection you have defined from the Repository content wizard.

    For further information about setting up a reusable HDFS connection, see Centralizing HDFS metadata.

    If you complete this step, you can skip the following steps about configuring tHDFSConfiguration because all the required fields should have been filled automatically.

  3. In the Version area, select the Hadoop distribution you need to connect to and its version.
  4. In the NameNode URI field, enter the location of the machine hosting the NameNode service of the cluster. If you are using WebHDFS, the location should be webhdfs://masternode:portnumber; WebHDFS with SSL is not supported yet.
  5. In the Username field, enter the authentication information used to connect to the HDFS system to be used. Note that the user name must be the same as you have put in the Spark configuration tab.