tS3BucketExist - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

tS3BucketExist properties

Component family

Cloud/AmazonS3

 

Function

Checks if a bucket exists on Amazon S3.

"Bucket" is a term used by AWS for top level folders on S3, which can contain sub folders and store all your data (objects).

Purpose

tS3BucketExist is designed to verify if the specified bucket exists on Amazon S3.

Basic settings

Use existing connection

Select this check box and in the Component List click the relevant connection component to reuse the connection details you already defined.

 

Access Key

The Access Key ID that uniquely identifies an AWS Account. For how to get your Access Key and Access Secret, visit Getting Your AWS Access Keys.

 

Access Secret

The Secret Access Key, constituting the security credentials in combination with the access Key.

To enter the secret key, click the [...] button next to the secret key field, and then in the pop-up dialog box enter the password between double quotes and click OK to save the settings.

 

Region

Specify the AWS region by selecting a region name from the list or entering a region between double quotation marks (e.g. "us-east-1") in the list. For more information about the AWS Region, see Regions and Endpoints.

 

Bucket

Name of the bucket, namely the top level folder, on the S3 server.

 

Die on error

This check box is cleared by default, meaning to skip the row on error and to complete the process for error-free rows.

Advanced settings

Config client

Select this check box to configure client parameters.

Client parameter: select client parameters from the list.

Value: enter the parameter value.

Not available when Use existing connection is selected.

 

tStatCatcher Statistics

Select this check box to collect log data at the component level.

Dynamic settings

Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database connection dynamically from multiple connections planned in your Job. This feature is useful when you need to access database tables having the same data structure but in different databases, especially when you are working in an environment where you cannot change your Job settings, for example, when your Job has to be deployed and executed independent of Talend Studio.

Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes unusable.

For examples on using dynamic parameters, see Scenario 3: Reading data from MySQL databases through context-based dynamic connections and Scenario: Reading data from different MySQL databases using dynamically loaded connection parameters. For more information on Dynamic settings and context variables, see Talend Studio User Guide.

Global Variables

BUCKET_EXIST: the existence of a specified bucket. This is a Flow variable and it returns a boolean.

BUCKET_NAME: the name of a specified bucket. This is an After variable and it returns a string.

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

This component is usually used with other S3 components, e.g. tS3BucketCreate.

Log4j

If you are using a subscription-based version of the Studio, the activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User Guide.

For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.

Limitation

Due to license incompatibility, one or more JARs required to use this component are not provided. You can install the missing JARs for this particular component by clicking the Install button on the Component tab view. You can also find out and add all missing JARs easily on the Modules tab in the Integration perspective of your studio. For details, see https://help.talend.com/display/KB/How+to+install+external+modules+in+the+Talend+products or the section describing how to configure the Studio in the Talend Installation Guide.

Scenario: Verifing the absence of a bucket, creating it and listing all the S3 buckets

In this scenario, tS3BucketExist is used to verify the absence of a bucket, tS3BucketCreate to create that bucket upon confirmation, and tS3BucketList to list all the buckets on Amazon S3.

Linking the components

  1. Drop tS3Connection, tS3BucketExist, tS3BucketCreate, tS3BucketList, tIterateToFlow and tLogRow onto the workspace.

  2. Link tS3Connection to tS3BucketExist using the OnSubjobOk trigger.

  3. Link tS3BucketExist to tS3BucketCreate using the Run if trigger.

  4. Link tS3BucketCreate to tS3BucketList using the OnSubjobOk trigger.

  5. Link tS3BucketList to tIterateToFlow using the Row > Iterate connection.

  6. Link tIterateToFlow to tLogRow using the Row > Main connection.

Configuring the components

  1. Double-click tS3Connection to open its Basic settings view.

  2. In the Access Key and Secret Key fields, enter the authentication credentials.

  3. Double-click tS3BucketExist to open its Basic settings view.

  4. Select the Use existing connection check box to reuse the connection.

  5. In the Bucket field, enter the bucket name to check if it exists.

  6. Double-click the If link to define the condition.

  7. In the Condition box, enter the expression:

    !((Boolean)globalMap.get("tS3BucketExist_1_BUCKET_EXIST"))

    This way, the rest of the Job will be executed if the specified bucket does not exist.

  8. Double-click tS3BucketCreate to open its Basic settings view.

    Select the Use existing connection check box to reuse the connection.

    In the Bucket field, enter the bucket name to create.

  9. Double-click tS3BucketList to open its Basic settings view.

    Select the Use existing connection check box to reuse the connection.

  10. Double-click tIterateToFlow to open its Basic settings view.

  11. Click Edit schema to open the schema editor.

    Click the [+] button to add one column, namely bucket_list of the String type.

    Click Ok to validate the setup and close the schema editor.

  12. In the Mapping area, press Ctrl + Space in the Value field to choose the variable tS3BucketList_1_CURRENT_BUCKET_NAME.

  13. Double-click tLogRow to open its Basic settings view.

    Select Table (print values in cells of a table) for a better display of the results.

Executing the Job

  1. Press Ctrl + S to save the Job.

  2. Press F6 to run the Job.

    As shown above, the bucket is created and all the buckets are listed.

  3. Go to the S3 web console:

    As shown above, the bucket has been created on the S3 server.

  4. Refresh the S3 Browser client:

    This shows that the S3 Create action was performed successfully.