Scenario: Managing files with Google Cloud Storage - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

The scenario describes a Job which uploads files from the local directory to a bucket in Google Cloud Storage, then performs copy, move and delete operations on those files, and finally lists and displays the files in relevant buckets on the console.

Prerequisites: You have purchased a Google Cloud Storage account and created three buckets under the same Google Storage directory. In this example, the buckets created are bighouse, bed_room, and study_room.

Dropping and linking the components

To design the Job, proceed as follows:

  1. Drop the following components from the Palatte to design the workspace: one tGSConnection component, one tGSPut component, two tGSCopy components, one tGSDelete component, one tGSList component, one tIterateToFlow component, one tLogRow component and one tGSClose component.

  2. Connect tGSConnection to tGSPut using a Trigger > On Subjob Ok link.

  3. Connect tGSPut to the first tGSCopy using a Trigger > On Subjob Ok link.

  4. Do the same to connect the first tGSCopy to the second tGSCopy, connect the second tGSCopy to tGSDelete, connect tGSDelete to tGSList, and connect tGSList to tGSClose.

  5. Connect tGSList to tIterateToFlow using a Row > Iterate link.

  6. Connect tIterateToFlow to tLogRow using a Row > Main link.

Configuring the components

Opening a connection to Google Cloud Storage

  1. Double-click the tGSConnection component to open its Basic settings view in the Component tab.

  2. Navigate to the Google APIs Console in your web browser to access the Google project hosting the Cloud Storage services you need to use.

  3. Click Google Cloud Storage > Interoperable Access to open its view, and copy the access key and secret key.

  4. In the Component view of the Studio, paste the access key and secret key to the corresponding fields respectively.

Uploading files to Google Cloud Storage

  1. Double-click the tGSPut component to open its Basic settings view in the Component tab.

  2. Select the Use an existing connection check box and then select the connection you have configured earlier.

  3. In the Bucket name field, enter the name of the bucket into which you want to upload files. In this example, bighouse.

  4. In the Local directory field, browse to the directory from which the files will be uploaded, D:/Input/House in this example.

    The files under this directory are shown below:

  5. Leave other settings as they are.

Copying all files from one bucket to another bucket

  1. Double-click the first tGSCopy component to open its Basic settings view in the Component tab.

  2. Select the Use an existing connection check box and then select the connection you have configured earlier.

  3. In the Source bucket name field, enter the name of the bucket from which you want to copy files, bighouse in this example.

  4. Select the Source is a folder check box. All files from the bucket bighouse will be copied.

  5. In the Target bucket name field, enter the name of the bucket into which you want to copy files, bed_room in this example.

  6. Select Copy from the Action list.

Moving a file from one bucket to another bucket and renaming it

  1. Double-click the second tGSCopy component to open its Basic settings view in the Component tab.

  2. Select the Use an existing connection check box and then select the connection you have configured earlier.

  3. In the Source bucket name field, enter the name of the bucket from which you want to move files, bighouse in this example.

  4. In the Source object key field, enter the key of the object to be moved, computer_01.txt in this example.

  5. In the Target bucket name field, enter the name of the bucket into which you want to move files, study_room in this example.

  6. Select Move from the Action list. The specified source file computer_01.txt will be moved from the bucket bighouse to study_room.

  7. Select the Rename check box. In the New name field, enter a new name for the moved file. In this example, the new name is laptop.txt.

  8. Leave other settings as they are.

Deleting a file in one bucket

  1. Double-click the tGSDelete component to open its Basic settings view in the Component tab.

  2. Select the Use an existing connection check box and then select the connection you have configured earlier.

  3. Select the Delete object from bucket list check box. Fill in the Bucket table with the file information that you want to delete.

    In this example, the file computer_03.csv will be deleted from the bucket bed_room whose files are copied from the bucket bighouse.

Listing all files in the three buckets

  1. Double-click the tGSList component to open its Basic settings view in the Component tab.

  2. Select the Use an existing connection check box and then select the connection you have configured earlier.

  3. Select the List objects in bucket list check box. In the Bucket table, enter the name of the three buckets in the Bucket name column, bighouse, study_room, and bed_room.

  4. Double-click the tIterateToFlow component to open its Basic settings view in the Component tab.

  5. Click Edit schema to define the data to pass on to tLogRow.

    In this example, add two columns bucketName and key, and set their types to Object.

  6. The Mapping table will be populated with the defined columns automatically.

    In the Value column, enter globalMap.get("tGSList_2_CURRENT_BUCKET") for the bucketName column and globalMap.get("tGSList_2_CURRENT_KEY") for the key column. You can also press Ctrl + Space and then choose the appopriate variable.

  7. Double-click the tLogRow component to open its Basic settings view in the Component tab.

  8. Select Table (print values in cells of a table) for a better view of the results.

Closing the connection to Google Cloud Storage

  1. Double-click the tGSClose component to open its Basic settings view in the Component tab.

  2. Select the connection you want to close from the Component List.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.

  2. Execute the Job by pressing F6 or clicking Run on the Run tab.

    The files in the three buckets are displayed. As expected, at first, the files from the bucket bighouse are copied to the bucket bed_room, then the file computer_01.txt from the bucket bighouse is moved to the bucket study_room and renamed to be laptop.txt, finally the file computer_03.csv is deleted from the bucket bed_room.