Defining the AWS Qubole connection parameters for Spark Jobs - 7.1

Qubole

author
Talend Documentation Team
EnrichVersion
7.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
task
Design and Development > Designing Jobs > Serverless > Qubole
EnrichPlatform
Talend Studio

Complete the Qubole connection configuration in the Spark configuration tab of the Run view of your Job. This configuration is effective on a per-Job basis.

Qubole is supported only in the traditional data integration framework (the Standard framework) and in the Spark frameworks.

The information in this section is only for users who have subscribed to Talend Data Fabric or to any Talend product with Big Data but it is not applicable to Talend Open Studio for Big Data users.

Before you begin

  • You have properly set up your Qubole cluster on AWS. For further information about how to do this, see Getting Started with Qubole on AWS from the Qubole documentation.
  • Ensure that the Qubole account to be used has the proper IAM role that is allowed to read/write to the S3 bucket to be used. For further details, contact the administrator of your Qubole system or see Cross-account IAM Role for QDS from the Qubole documentation.
  • Ensure that the AWS account to be used has the proper read/write permissions to the S3 bucket to be used. For this purpose, contact the administrator of your AWS system.

Procedure

  1. Enter the basic connection information to Qubole.

    Connection configuration

    • Click the ... button next to the API Token field to enter the authentication token generated for the Qubole user account to be used. For further information about how to obtain this token, see Manage Qubole account from the Qubole documentation.

      This token allows you to specify the user account you want to use to access Qubole. Your Job automatically uses the rights and permissions granted to this user account in Qubole.

    • Select the Cluster label check box and enter the name of the Qubole cluster to be used. If leaving this check box clear, the default cluster is used.

      If you need details about your default cluster, ask the administrator of your Qubole service. You can also read this article from the Qubole documentaiton to find more information about configuring a default Qubole cluster.

    • Select the Change API endpoint check box and select the region to be used. If leaving this check box clear, the default region is used.

      For further information about the Qubole Endpoints supported on QDS-on-AWS, see Supported Qubole Endpoints on Different Cloud Providers.

  2. Configure the connection to the S3 file system to be used to temporarily store the dependencies of your Job so that your Qubole cluster has access to these dependencies.
    This configuration is used for your Job dependencies only. Use a tS3Configuration in your Job to write your actual business data in the S3 system with Qubole. Without tS3Configuration, this business data is written in the Qubole HDFS system and destroyed once you shut down your cluster.
    • Access key and Secret key: enter the authentication information required to connect to the Amazon S3 bucket to be used.

      To enter the password, click the [...] button next to the password field, and then in the pop-up dialog box enter the password between double quotes and click OK to save the settings.

    • Bucket name: enter the name of the bucket in which you want to store the dependencies of your Job. This bucket must already exist on S3.
    • Temporary resource folder: enter the directory in which you want to store the dependencies of your Job. For example, enter temp_resources to write the dependencies in the /temp_resources folder in the bucket.

      If this folder already exists at runtime, its contents are overwritten by the upcoming dependencies; otherwise, this folder is automatically created.

    • Region: specify the AWS region by selecting a region name from the list. For more information about the AWS Region, see Regions and Endpoints.

Results