Accessing files on a Hadoop cluster from your engine - Cloud

Talend Remote Engine Gen2 Quick Start Guide

Version
Cloud
Language
English
Product
Talend Cloud
Module
Talend Management Console
Talend Pipeline Designer
Content
Deployment > Deploying > Executing Pipelines
Installation and Upgrade
Last publication date
2024-01-24

Before you begin

  • Make sure you use a recent version of docker-compose in order to avoid issues of volumes not correctly mounted.
  • Contact your system administrator to get the list of the complete set of Hadoop configuration files (core-site.xml, hdfs-site.xml, etc.).
  • Put these Hadoop configuration files in a folder on your local machine and copy its path.

Procedure

  1. Go to the following folder in the Remote Engine Gen2 installation directory:
    default if you are using the engine in the AWS USA, AWS Europe, AWS Asia-Pacific or Azure regions.

    eap if you are using the engine as part of the Early Adopter Program.

  2. Create a new file and name it:
    docker-compose.override.yml
  3. Edit this file to add the following:
    version: '3.6'
    
    services: 
    
      livy: 
        environment: 
          HADOOP_CONF_DIR: file:/opt/my-hadoop-cluster-config
        volumes: 
          - YOUR_LOCAL_HADOOP_CONFIGURATION_FOLDER:/opt/my-hadoop-cluster-config
       
      component-server: 
        environment: 
          HADOOP_CONF_DIR: file:/opt/my-hadoop-cluster-config
        volumes: 
          - YOUR_LOCAL_HADOOP_CONFIGURATION_FOLDER:/opt/my-hadoop-cluster-config

    where YOUR_LOCAL_HADOOP_CONFIGURATION_FOLDER corresponds to the path to the local folder where your Hadoop configuration files are stored.

  4. Save the file to take your changes into account.
  5. Restart your Remote Engine Gen2.
  6. Connect to Talend Cloud Pipeline Designer.
  7. Go to the Connections page and add a new HDFS connection using your engine and your local user name.
    Adding new HDFS connection.
  8. Add a new HDFS dataset using the new connection and make sure you use the path to your files (for example hdfs://namenode:8020/user/talend/files).
    Adding a new HDFS dataset.