tAmazonEMRManage properties - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Component family

Cloud

Basic settings

Access Key and Secret Key

Enter the access key and the secret key required by Amazon to authenticate your requests to its web services. These access credentials are generated from the Security Credential tab of your Amazon account page.

To enter the secret key, click the [...] button next to the secret key field, and then in the pop-up dialog box enter the password between double quotes and click OK to save the settings.

Configuration

Action

Select an action to be performed from the list, either Start or Stop.

  • Start: launch an Amazon EMR cluster.

  • Stop: terminate an Amazon EMR cluster.

Region

Specify the AWS region by selecting a region name from the list or entering a region between double quotation marks (for example "us-east-1"). For more information about how to specify the AWS region, see Choose an AWS Region.

Cluster name

Enter the name of the cluster.

Cluster version

Select the version of the cluster.

Application

Select the applications to be installed on the cluster.

This list is available only when an EMR version is selected from the Cluster version list.

Service role

Enter the IAM (Identity and Access Management) role for the Amazon EMR service. The default role is EMR_DefaultRole. To use this default role, you must have already created it.

Job flow role

Enter the IAM role for the EC2 instances that Amazon EMR manages. The default role is EMR_EC2_DefaultRole. To use this default role, you must have already created it.

Enable log

Select this check box to enable logging and in the field displayed specify the path to a folder in an S3 bucket where you want Amazon EMR to write the log data.

Use EC2 key pair

Select this check box to associate an Amazon EC2 (Elastic Compute Cloud) key pair with the cluster and in the field displayed enter the name of your EC2 key pair.

Predicate

Specify the cluster(s) that you want to stop:

  • All running clusters: all running clusters will be stopped.

  • All running clusters with predefined name: the running cluster with a given name will be stopped. In the Cluster name field displayed, you need to specify the name of the cluster to be stopped.

  • Running cluster with predefined id: the running cluster with a given ID will be stopped. In the Cluster id field displayed, you need to specify the ID of the cluster to be stopped.

This list is available only when Stop is selected from the Action list.

Instance Configuration

Instance count

Enter the number of Amazon EC2 instances to initialize.

Master instance type

Select the type of the master instance to initialize.

Slave instance type

Select the type of the slave instance to initialize.

Advanced settings

Wait for cluster ready

Select this check box to let your Job wait until the launch of the cluster is completed.

Visible to all users

Select this check box to make the cluster visible to all IAM users.

Termination Protect

Select this check box to enable termination protection to prevent instances in the cluster from shutting down due to errors or issues during processing.

Master security group

Specify the security group for the master instance.

Additional master security groups

Specify additional security groups for the master instance and separate them with a comma, for example, gname1, gname2, gname3.

Slave security group

Specify the security group for the slave instances.

Additional slave security groups

Specify additional security groups for the slave instances and separate them with a comma, for example, gname1, gname2, gname3.

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the Job level as well as at each component level.

Global Variables

CLUSTER_FINAL_ID: the ID of the cluster. This is an After variable and it returns a string.

CLUSTER_FINAL_NAME: the name of the cluster. This is an After variable and it returns a string.

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

tAmazonEMRManage is usually used as a standalone component.

Log4j

If you are using a subscription-based version of the Studio, the activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User Guide.

For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.