tGSPut - 6.3

Talend Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Warning

This component is available in the Palette of the studio only if you have subscribed to one of the Talend solutions with Big Data.

Function

tGSPut uploads files from a local directory to Google Cloud Storage.

Purpose

tGSPut allows you upload files to Google Cloud Storage so that you can manage them with Google Cloud Storage.

tGSPut properties

Component Family

Big Data / Google Cloud Storage

 

Basic settings

Use an existing connection

Select this check box and in the Component List click the relevant connection component to reuse the connection details you already defined.

 

Access Key and Secret Key

Type in the authentication information obtained from Google for making requests to Google Cloud Storage.

These keys can be consulted on the Interoperable Access tab view under the Google Cloud Storage tab of the project from the Google APIs Console.

To enter the secret key, click the [...] button next to the secret key field, and then in the pop-up dialog box enter the password between double quotes and click OK to save the settings.

For more information about the access key and secret key, go to https://developers.google.com/storage/docs/reference/v1/getting-startedv1?hl=en/ and see the description about developer keys.

Warning

The Access Key and Secret Key fields will be available only if you do not select the Use an existing connection check box.

 

Bucket name

Type in the name of the bucket into which you want to upload files.

 

Local directory

Type in the full path of or browse to the local directory where the files to be uploaded are located.

 

Google Storage directory

Type in the Google Storage directory to which you want to upload files.

 

Use files list

Select this check box and complete the Files table.

  • Filemask: enter the filename or filemask using wildcharacters (*) or regular expressions.

  • New name: enter a new name for the file after being uploaded.

 

Die on error

This check box is cleared by default, meaning to skip the row on error and to complete the process for error-free rows.

Advanced settings

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the Job level as well as at each component level.

Global Variables

NB_LINE: the number of rows read by an input component or transferred to an output component. This is an After variable and it returns an integer.

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

This component can be used together with other components, particularly the tGSGet component.

Log4j

If you are using a subscription-based version of the Studio, the activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User Guide.

For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.

Limitation

n/a

Scenario: Managing files with Google Cloud Storage

The scenario describes a Job which uploads files from the local directory to a bucket in Google Cloud Storage, then performs copy, move and delete operations on those files, and finally lists and displays the files in relevant buckets on the console.

Prerequisites: You have purchased a Google Cloud Storage account and created three buckets under the same Google Storage directory. In this example, the buckets created are bighouse, bed_room, and study_room.

Dropping and linking the components

To design the Job, proceed as follows:

  1. Drop the following components from the Palatte to design the workspace: one tGSConnection component, one tGSPut component, two tGSCopy components, one tGSDelete component, one tGSList component, one tIterateToFlow component, one tLogRow component and one tGSClose component.

  2. Connect tGSConnection to tGSPut using a Trigger > On Subjob Ok link.

  3. Connect tGSPut to the first tGSCopy using a Trigger > On Subjob Ok link.

  4. Do the same to connect the first tGSCopy to the second tGSCopy, connect the second tGSCopy to tGSDelete, connect tGSDelete to tGSList, and connect tGSList to tGSClose.

  5. Connect tGSList to tIterateToFlow using a Row > Iterate link.

  6. Connect tIterateToFlow to tLogRow using a Row > Main link.

Configuring the components

Opening a connection to Google Cloud Storage

  1. Double-click the tGSConnection component to open its Basic settings view in the Component tab.

  2. Navigate to the Google APIs Console in your web browser to access the Google project hosting the Cloud Storage services you need to use.

  3. Click Google Cloud Storage > Interoperable Access to open its view, and copy the access key and secret key.

  4. In the Component view of the Studio, paste the access key and secret key to the corresponding fields respectively.

Uploading files to Google Cloud Storage

  1. Double-click the tGSPut component to open its Basic settings view in the Component tab.

  2. Select the Use an existing connection check box and then select the connection you have configured earlier.

  3. In the Bucket name field, enter the name of the bucket into which you want to upload files. In this example, bighouse.

  4. In the Local directory field, browse to the directory from which the files will be uploaded, D:/Input/House in this example.

    The files under this directory are shown below:

  5. Leave other settings as they are.

Copying all files from one bucket to another bucket

  1. Double-click the first tGSCopy component to open its Basic settings view in the Component tab.

  2. Select the Use an existing connection check box and then select the connection you have configured earlier.

  3. In the Source bucket name field, enter the name of the bucket from which you want to copy files, bighouse in this example.

  4. Select the Source is a folder check box. All files from the bucket bighouse will be copied.

  5. In the Target bucket name field, enter the name of the bucket into which you want to copy files, bed_room in this example.

  6. Select Copy from the Action list.

Moving a file from one bucket to another bucket and renaming it

  1. Double-click the second tGSCopy component to open its Basic settings view in the Component tab.

  2. Select the Use an existing connection check box and then select the connection you have configured earlier.

  3. In the Source bucket name field, enter the name of the bucket from which you want to move files, bighouse in this example.

  4. In the Source object key field, enter the key of the object to be moved, computer_01.txt in this example.

  5. In the Target bucket name field, enter the name of the bucket into which you want to move files, study_room in this example.

  6. Select Move from the Action list. The specified source file computer_01.txt will be moved from the bucket bighouse to study_room.

  7. Select the Rename check box. In the New name field, enter a new name for the moved file. In this example, the new name is laptop.txt.

  8. Leave other settings as they are.

Deleting a file in one bucket

  1. Double-click the tGSDelete component to open its Basic settings view in the Component tab.

  2. Select the Use an existing connection check box and then select the connection you have configured earlier.

  3. Select the Delete object from bucket list check box. Fill in the Bucket table with the file information that you want to delete.

    In this example, the file computer_03.csv will be deleted from the bucket bed_room whose files are copied from the bucket bighouse.

Listing all files in the three buckets

  1. Double-click the tGSList component to open its Basic settings view in the Component tab.

  2. Select the Use an existing connection check box and then select the connection you have configured earlier.

  3. Select the List objects in bucket list check box. In the Bucket table, enter the name of the three buckets in the Bucket name column, bighouse, study_room, and bed_room.

  4. Double-click the tIterateToFlow component to open its Basic settings view in the Component tab.

  5. Click Edit schema to define the data to pass on to tLogRow.

    In this example, add two columns bucketName and key, and set their types to Object.

  6. The Mapping table will be populated with the defined columns automatically.

    In the Value column, enter globalMap.get("tGSList_2_CURRENT_BUCKET") for the bucketName column and globalMap.get("tGSList_2_CURRENT_KEY") for the key column. You can also press Ctrl + Space and then choose the appopriate variable.

  7. Double-click the tLogRow component to open its Basic settings view in the Component tab.

  8. Select Table (print values in cells of a table) for a better view of the results.

Closing the connection to Google Cloud Storage

  1. Double-click the tGSClose component to open its Basic settings view in the Component tab.

  2. Select the connection you want to close from the Component List.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.

  2. Execute the Job by pressing F6 or clicking Run on the Run tab.

    The files in the three buckets are displayed. As expected, at first, the files from the bucket bighouse are copied to the bucket bed_room, then the file computer_01.txt from the bucket bighouse is moved to the bucket study_room and renamed to be laptop.txt, finally the file computer_03.csv is deleted from the bucket bed_room.