Skip to main content Skip to complementary content
Close announcements banner

Contextualizing the Hadoop connection parameters

Contextualize the Hadoop connection parameters to make this connection portable over different Hadoop environments such as a test environment and a production environment.

Before you begin

  • Ensure that the client machine on which Talend Studio is installed can recognize the host names of the nodes of the Hadoop cluster to be used. For this purpose, add the IP address/hostname mapping entries for the services of that Hadoop cluster in the hosts file of the client machine.

    For example, if the host name of the Hadoop Namenode server is talend-cdh550.weave.local, and its IP address is 192.168.x.x, the mapping entry reads 192.168.x.x talend-cdh550.weave.local.

  • The Hadoop cluster to be used has been properly configured and is running.

  • A Hadoop connection has been properly set up following the explanations in Setting up the Hadoop connection.

  • The Integration perspective is active.

  • Cloudera is the example distribution of the current article. If you are using a different distribution, you may need to bear in mind the particular prerequisites explained as follows:
    • If you need to connect to MapR from the Studio, ensure that you have installed the MapR client in the machine where the Studio is, and added the MapR client library to the PATH variable of that machine. According to the MapR documentation, the library or libraries of a MapR client corresponding to each OS version can be found under MAPR_INSTALL\/hadoop\hadoop-VERSION/lib/native. For example, the library for Windows is \lib\native\MapRClient.dll in the MapR client jar file. For further information, see MapR documentation.

    • If you need to connect to a Google Dataproc cluster, set the path to the Google credentials file associated with the service account to be used in the environment variables of your local machine, so that the Check service feature of the metadata wizard can properly verify your configuration.

      For further information how to set the environment variable, see Getting Started with Authentication of Google documentation.

Procedure

  1. In the Repository tree view of your Studio, expand Metadata > Hadoop cluster and double-click the Hadoop connection you created following Setting up the Hadoop connection.
  2. Click Next to go to the step 2 window of this wizard and click Export as context.
  3. In the Create/Reuse a context group wizard, select Create a new repository context and click Next.
  4. Enter a name for the context group, for example smart_connection, and click Next.

    A read-only view of this context group is created and automatically filled with the parameters of the given Hadoop connection you defined in Setting up the Hadoop connection.

    You may also notice that not all of the connection parameters are added to the context group, meaning that they are not all contextualized, as expected.

  5. Click Finish to validate the creation and switch back to the step 2 window of the Hadoop connection wizard.

    The connection parameters have been automatically set to use the context variables and become read-only.

  6. Click Finish.

    The new context group, named smart_connection, has been created under the Contexts node.

  7. In Repository, double-click this new context group to open the Create/Edit a context group wizard.
  8. Click Next to pass to step 2 in order to edit the context variables.
  9. Click the [+] button on the right of the table to add a new context.
  10. Click New and enter the name of this new context, for example, prod.
  11. Click OK to validate the changes and close the New context wizard. The new context is added to the context list.
  12. Click OK and close the Configure contexts wizard to go back to the Create/Edit a context group wizard.
  13. Define the new context to contain the connection parameter values for a different Hadoop cluster, for example, your production one.
  14. Click Finish to validate the changes and accept the propagation.
  15. Back to the Hadoop cluster node in Repository, double-click the Hadoop connection you are contextualizing to open its wizard.
  16. In the step 2 window of this wizard, ensure that the Use custom Hadoop configuration check box is selected and click Configuration to open the Hadoop configuration wizard.

    The prod context is displayed in the wizard and the message "Please import the jar." next to it prompts you to import the Hadoop configuration file specific to the Hadoop cluster this prod context is created for.

    You can also notice that the Default context that was the first context generated for this given Hadoop connection, smart_connection, already possesses a Hadoop configuration jar file. This jar file was automatically generated at the end the process of defining this Hadoop connection and creating the Default context for it.

    You can also select the Set path to custom the Hadoop configure JAR check box to specify the path to the configuration JAR file.

  17. Click the field of this "Please import the jar." message to display the [...] button and click this button to open the Hadoop configuration import wizard wizard.

    This step starts the same process as explained in Setting up the Hadoop connection to either automatically or manually set up the Hadoop configuration. But at the end of this process, this step is meant to generate only the appropriate Hadoop configuration jar file for the prod context but not to create a new Hadoop connection item under the Hadoop cluster node.

  18. Click OK to validate the changes and then click Finish to validate the contextualization and close the Hadoop connection wizard.

    If prompted, click Yes to accept the propagation.

  19. The Hadoop connection is now contextualized and you can continue to create child connections to its elements such as HBase, HDFS and Hive etc. based on this connection. Each connection wizard contains the Export as context button that you can use to contextualize each connection.

Results

When you reuse these connections via the Property type list in a given component in your Jobs, these contexts are listed in the Run view of the Job at your disposal.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!