Scenario: Retrieving files from a Azure Storage container - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

In this scenario, a five-component Job uses Azure Storage components to write files in a given Azure Storage system and then retrieve selected files (blobs in terms of Azure Storage) from that system.

Before replicating this scenario, you must have appropriate rights and permissions to read and write files in the Azure storage account to be used. For further information, see Microsoft's documentation for Azure Storage: http://azure.microsoft.com/en-us/documentation/services/storage/.

The talendcontainer container used in this scenario was created using tAzureStorageContainerCreate in the scenario Scenario: Creating a container in Azure Storage.

Linking the components

  1. In the Integration perspective of the Studio, create an empty Job, named azureTalend for example, from the Job Designs node in the Repository tree view.

    For further information about how to create a Job, see Talend Studio User Guide.

  2. Drop tAzureStoragePut, tAzureStorageList, tJava and tAzureStorageGet onto the workspace.

  3. Connect the Azure Storage components using the Trigger > OnSubjobOk link while connect tAzureStorageList to tJava using the Row > Iterate link.

Connecting to an Azure storage account

  1. Double-click tAzureStorageConnection to open its Component view.

  2. In the Account name field, enter the name of the storage account to be connected to. In this example, it is talendstorage, an account that has been created for demonstration purposes.

  3. In the Account key field, paste the primary or the secondary key associated with the storage account to be used. These keys can be found in the Manage Access Key dashboard in the Azure Storage system to be connected to.

  4. From the Protocol list, select the protocol for the endpoint of the storage account to be used. In this example, it is HTTPS.

Writing files in Azure Storage

  1. Double-click tAzureStoragePut to open its Component view.

  2. Select the Use an existing connection check box and then select the connection you have configured earlier. In this example, it is tAzureStorageConnection_1.

  3. In the Container name field, enter the name of the container you need to write files in. In this example, it is talendcontainer, a container created in the scenario Scenario: Creating a container in Azure Storage.

  4. In the Local folder field, enter the path, or browse, to the directory where the files to be used are stored. In this scenario, they are some pictures showing technical process and stored locally in E:/photos. Therefore, put E:/photos; this allows tAzureStoragePut to upload all the files of this folder and its sub-folders into the talendcontainer container.

    For demonstration purposes, the example photos are organized as follows in the E:/photos folder:

    • Directly beneath the E:/photos level:

      components-use_case_triakinput_1.png

      components-use_case_triakinput_2.png

      components-use_case_triakinput_3.png

      components-use_case_triakinput_4.png

    • In the E:/photos/mongodb/step1 directory:

      components-use_case_tmongodbbulkload_1.png

      components-use_case_tmongodbbulkload_2.png

      components-use_case_tmongodbbulkload_3.png

      components-use_case_tmongodbbulkload_4.png

    • In the E:/photos/mongodb/step2 directory:

      components-use_case_tmongodbbulkload_5.png

      components-use_case_tmongodbbulkload_6.png

      components-use_case_tmongodbbulkload_7.png

      components-use_case_tmongodbbulkload_8.png

  5. In the Azure Storage folder field, enter the directory where you want to write files. This directory will be created in the container to be used if it does not exist. In this example, enter photos.

    If you enter nothing but leave the default quotation marks as it is, then files, as well as their local directory, will be written directly beneath the container level.

Verifying the file transfer

Configuring tAzureStorageList

  1. Double-click tAzureStorageList to open its Component view.

  2. Select the Use an existing connection check box and then select the connection you have configured earlier. In this example, it is tAzureStorageConnection_1.

  3. In the Container name field, enter the name of the container in which you need to check whether the given files exist. In this scenario, it is talendcontainer.

  4. Under the Blob filter table, click the [+] button to add one row in the table.

  5. In the Prefix column, enter the common prefix of the names of the files (blobs) to be checked. This prefix represents a virtual directory level you designate as the starting point down from which files (blobs) are checked. In this example, it is photos/.

    For further information about blob names, see http://msdn.microsoft.com/en-us/library/dd135715.aspx

  6. In the Include sub-directories column, select the check box in the newly added row. This allows tAzureStorageList to check all the files at any hierarchical level beneath the designated starting point.

Configuring tJava

  1. Double-click tJava to open its Component view.

  2. In the Code field, enter

    System.out.println();
  3. In the Outline panel, which, by default, is found to the left side of the Component view, expand the tAzureStorageList node.

  4. From the Outline panel, drop the CONTAINER_BLOB global variable into the parentheses in the code in the Component view so as to make the code read:

    System.out.println(((Boolean)globalMap.get("tAzureStorageList_1_CURRENT_BLOB")));

Retrieving selected files

  1. Double-click tAzureStorageGet to open its Component view.

  2. Select the Use an existing connection check box and then select the connection you have configured earlier. In this example, it is tAzureStorageConnection_1.

  3. In the Container name field, enter the name of the container from which you need to retrieve files. In this scenario, it is talendcontainer.

  4. In the Local folder field, enter the path, or browse, to the directory where you want to put the retrieved files. In this example, it is E:/screenshots.

  5. Under the Blob table, click the [+] button to add one row in the table.

  6. In the Prefix column, enter the common name prefix of the files (blobs) to be retrieved. In this example, it is photos/mongodb/.

  7. In the Include sub-directories column, select the check box in the newly added row. This allows tAzureStorageGet to retrieve all the files (blobs) beneath the photos/mongodb/ level.

  8. In the Create parent directories column, select the check box in the newly added row to create the same directory in the specified local folder as the retrieved blobs have in the container.

    Note that having this same directory is necessary for successfully retrieving blobs. If you leave this check box clear, then you need to create the same directory yourself in the target local folder.

Executing the Job

  • Press F6 to run this Job.

Once done, the Run view is opened automatically, where you can check the execution result.

You can read that the Job returns the list of the blobs with the photos prefix in the container.

This can also be seen in the web console of the Azure storage account:

In the specified local folder, the blobs with the photos/mongodb/ prefix have been retrieved and their prefix transformed to directories.