Setting reusable Hadoop properties - Cloud - 8.0

Talend Studio User Guide

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development
Last publication date
2024-02-29
Available in...

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

About this task

When setting up a Hadoop connection, you can define a set of common Hadoop properties that will be reused by its child connections to each individual Hadoop element such as Hive, HDFS or HBase.

For example, in the Hadoop cluster you need to use, you have defined the HDFS High Availability (HA) feature in the hdfs-site.xml file of the cluster itself; then you need to set the corresponding properties in the connection wizard in order to enable this High Availability feature in Talend Studio. Note that these properties can also be set in a specific Hadoop related component and the process of doing this is explained in the article about Enabling the HDFS High Availability feature in Talend Studio. In this section, only the connection wizard approach is presented.

Prerequisites:

  • Launch the Hadoop distribution you need to use and ensure that you have the proper access permission to that distribution.

  • The High Availability properties to be set in Talend Studio have been defined in the hdfs-site.xml file of the cluster to be used.

In this example, the High Availability properties are:
<property>  
  <name>dfs.nameservices</name>  
  <value>nameservice1</value>
</property>
<property>
  <name>dfs.client.failover.proxy.provider.nameservice1</name>
  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
  <name>dfs.ha.namenodes.nameservice1</name>
  <value>namenode90,namenode96</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.nameservice1.namenode90</name>
  <value>hdp-ha:8020</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.nameservice1.namenode96</name>
  <value>hdp-ha2:8020</value>
</property>

The values of these properties are for demonstration purposes only.

To set these properties in the Hadoop connection, open the Hadoop Cluster Connection wizard from the Hadoop cluster node of the Repository. For further information about how to access this wizard, see Centralizing a Hadoop connection.

Procedure

  1. Properly configure the connection to the Hadoop cluster to be used as explained in the previous sections, if you have not done so.
  2. Click the [...] button next to Hadoop properties to open the Hadoop properties table.
  3. Add the above-listed High Available properties to this table.
  4. Click OK to validate the changes. These properties are then listed next to the [...] button.
  5. Click the Check services button to verify the connection.
    A dialog box pops up to indicate the checking process and the connection status. If it shows that the connection fails, you need to review and update the connection information you have defined in the connection wizard.
  6. Click Finish to validate the connection.
    Then when you create a child connection, for example to Hive, from this Hadoop connection, these High Availability properties will be inherited there as read-only parent properties.

Results

This way, these properties can be automatically reused by any of its child Hadoop connection.

The image above shows these properties inherited in the Hive connection wizard. For further information about how to access the Hive connection wizard as presented in this section, see Centralizing Hive metadata.