Accessing files from your engine - Cloud

Talend Remote Engine Gen2 Quick Start Guide

Version
Cloud
Language
English
Product
Talend Cloud
Module
Talend Management Console
Talend Pipeline Designer
Content
Deployment > Deploying > Executing Pipelines
Installation and Upgrade

Before you begin

Make sure you use a recent version of docker-compose in order to avoid issues of volumes not correctly mounted in Livy.

Procedure

  1. Go to the following folder in the Remote Engine Gen2 installation directory:
    default if you are using the engine in the AWS USA, AWS Europe, AWS Asia-Pacific or Azure regions.

    eap if you are using the engine as part of the Early Adopter Program.

  2. Create a new file and name it:
    docker-compose.override.yml
  3. Edit this file to add the following:
    version: '3.6'
    
    services: 
    
      livy: 
        volumes: 
    
      component-server: 
        volumes: 
  4. Add a new entry under volumes using this format :
    YOUR_LOCAL_FOLDER:MOUNT_POINT_INSIDE_CONTAINER

    Example

    If you have some files in /home/user/my_avro_files on your machine that you would like to process with Talend Cloud Pipeline Designer you would need to add /home/user/my_avro_files:/opt/my_avro_files to the list of volumes :
    version: '3.6'
    
    services: 
    
      livy: 
        volumes: 
          - /home/user/my_avro_files:/opt/my_avro_files
    
      component-server: 
        volumes: 
          - /home/user/my_avro_files:/opt/my_avro_files
  5. Save the file to take your changes into account.
  6. Restart your Remote Engine Gen2.
    Your folder will now be accessible from the Talend Cloud Pipeline Designer app under /opt/my_avro_files.
  7. Connect to Talend Cloud Pipeline Designer.
  8. Go to the Connections page and add a new HDFS connection using your engine and your local user name.
  9. Add a new HDFS dataset using the new connection and make sure you use the mount path as the path to your folder.
  10. Optional: In order to write back to your local machine, you can add another HDFS dataset using the mount path folder, for example /opt/my_avro_files/my_pipeline_output.