Connecting to custom Hadoop distribution

Talend Studio Developer Guide

EnrichVersion
Cloud
EnrichProdName
Talend Integration Cloud
task
Design andĀ Development

After selecting this Custom option, click the button to display the [Import custom definition] dialog box and proceed as follows:

Note that custom versions are not officially supported by Talend. Talend and its community provide you with the opportunity to connect to custom versions from the Studio but cannot guarantee that the configuration of whichever version you choose will be easy, due to the wide range of different Hadoop distributions and versions that are available. As such, you should only attempt to set up such a connection if you have sufficient Hadoop experience to handle any issues on your own.

  1. Depending on your situation, select Import from existing version or Import from zip to configure the custom Hadoop distribution to be connected to.

    • If you have the configuration zip file of the custom Hadoop distribution you need to connect to, select Import from zip. In Talend Exchange, members of Talend community have shared some ready-for-use configuration zip files which you can download from this Hadoop configuration list and directly use them in your connection accordingly. However, because of the ongoing evolution of the different Hadoop-related projects, you might not be able to find the configuration zip corresponding to your distribution from this list; then it is recommended to use the Import from existing version option to take an existing distribution as base to add the jars required by your distribution.

      Note that the zip files are only configuration files and cannot be installed directly from Talend Exchange.

    • Otherwise, select Import from existing version to import an officially supported Hadoop distribution as base so as to customize it by following the wizard. Adopting this approach requires knowledge about the configuration of the Hadoop distribution to be used.

    Note that the check boxes in the wizard allow you to select the Hadoop element(s) you need to import. All the check boxes are not always displayed in your wizard depending on the context in which you are creating the connection. For example, if you are creating this connection for Oozie, then only the Oozie check box appears.

  2. Whether you have selected Import from existing version or Import from zip, verify that each check box next to the Hadoop element you need to import has been selected..

  3. Click OK and then in the pop-up warning, click Yes to accept overwriting any custom setup of jar files previously implemented-.

    Once done, the [Custom Hadoop version definition] dialog box becomes active.

    This dialog box lists the Hadoop elements and their jar files you are importing.

  4. If you have selected Import from zip, click OK to validate the imported configuration.

    If you have selected Import from existing version as base, you should still need to add more jar files to customize that version. Then from the tab of the Hadoop element you need to customize, for example, the HDFS/HCatalog/Oozie tab, click the [+] button to open the [Select libraries] dialog box.

  5. Select the External libraries option to open its view.

  6. Browse to and select any jar file you need to import.

  7. Click OK to validate the changes and to close the [Select libraries] dialog box.

    Once done, the selected jar file appears on the list in the tab of the Hadoop element being configured.

    Note that if you need to share the custom Hadoop setup with another Studio, you can export this custom connection from the [Custom Hadoop version definition] window using the button.

  8. In the [Custom Hadoop version definition] dialog box, click OK to validate the customized configuration. This brings you back to the configuration view in which you have selected the Custom option.

Now that the configuration of the custom Hadoop version has been set up and you are back to the Hadoop connection configuration view, you are able to continue to enter other parameters required by the connection.

If the custom Hadoop version you need to connect to contains YARN and you want to use it, select the Use YARN check box next to the Distribution list.

A video is available in the following link to demonstrate, by taking HDFS as example, how to set up the connection to a custom Hadoop cluster, also referred to as an unsupported Hadoop distribution: How to add an unsupported Hadoop distribution to the Studio.