Centralizing a Hadoop connection - Cloud - 8.0

Talend Studio User Guide

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development
Last publication date
2024-03-28
Available in...

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

About this task

Setting up a connection to a given Hadoop distribution in Repository allows you to avoid configuring that connection each time when you need to use the same Hadoop distribution.

You need to define a Hadoop connection before being able to create from the Hadoop cluster node the connections to each individual Hadoop element such as HDFS or Hive.

Prerequisites:
  • Ensure that the client machine on which the Talend Studio is installed can recognize the host names of the nodes of the Hadoop cluster to be used. For this purpose, add the IP address/hostname mapping entries for the services of that Hadoop cluster in the hosts file of the client machine.

    For example, if the host name of the Hadoop Namenode server is talend-cdh550.weave.local and its IP address is 192.168.x.x, the mapping entry reads 192.168.x.x talend-cdh550.weave.local.

  • The Hadoop cluster to be used has been properly configured and is running.

  • The Integration perspective is active.

  • If you need to connect to MapR from Talend Studio, ensure that you have installed the MapR client in the machine where Talend Studio is, and added the MapR client library to the PATH variable of that machine. According to MapR documentation, the library or libraries of a MapR client corresponding to each OS version can be found under MAPR_INSTALL/hadoop/hadoop-VERSION/lib/native. For example, the library for Windows is \lib\native\MapRClient.dll in the MapR client JAR file.

To create a Hadoop connection in the Repository, do the following:

Procedure

  1. In the Repository tree view of Talend Studio, expand Metadata and then right-click Hadoop cluster.
  2. Select Create Hadoop cluster from the contextual menu to open the Hadoop cluster connection wizard.
  3. Fill in generic information about this connection, such as Name and Description and click Next to open the Hadoop Configuration Import Wizard window that allows you to select the manual or the automatic mode to configure the connection.