Scenario 2: Reusing stored cookie to fetch files through HTTP - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

This scenario describes a two-component Job which logs in a given HTTP website and then using cookie stored in a user-defined local directory, fetches data from this website.

Dropping and linking components

  1. Drop two tFileFetch components onto your design workspace.

  2. Link the two components as subjobs using a Trigger > On Subjob Ok connection.

Configuring the components

Configuring the first subjob

  1. Double click tFileFetch_1 to open its component view.

  2. Select the protocol you want to use from the Protocol list. Here, we use the https protocol.

  3. In the URI field, type in the URI through which you can log in the website and fetch the web page accordingly. In this example, the URI is https://www.codeproject.com/script/Membership/LogOn.aspx?download=true.

  4. In the Destination directory field, browse to the folder where the fetched web page is to be stored. This folder will be created on the fly if it does not exist. In this example, type in D:/download.

  5. In the Destination Filename field, type in a new name for the file if you want it to be changed. In this example, codeproject.html.

  6. Under the Parameters table, click the plus button to add two rows and fill in the credentials for accessing the desired website..

    In the Name column, type in a new name respectively for the two rows. In this example, they are Email and Password, which are required by the website you are logging in.

    In the Value column, type in the authentication information.

  7. Select the Save cookie check box.

  8. In the Cookie file field, type in the full path to the file which you want to use to save the cookie. In this example, it is D:/download/cookie.

  9. Click Advanced settings to open its view.

  10. Select the Support redirection check box so that the redirection request will be repeated until the redirection is successful.

Configuring the second subjob

  1. Double-click tFileFetch_2 to open its Component view.

  2. From the Protocol list, select http.

  3. In the URI field, type in the address from which you fetch the files of your interest. In this example, the address is http://www.codeproject.com/script/articles/download.aspx?file=/KB/DLL/File_List_Downloader/FLD02June2011_Source.zip&rp=http://www.codeproject.com/Articles/203991/File-List-Downloader.

  4. In the Destination directory field, type in the directory or browse to the folder where you want to store the fetched files. This folder can be automatically created if it does not exist yet during the execution process. In this example, type in D:/download.

  5. In the Destination Filename field, type in a new name for the file if you want it to be changed. In this example, source.zip.

  6. Clear the POST method check box to deactivate the Parameters table.

  7. Select the Read cookie check box.

  8. In the Cookie file field, browse to the file which is used to save the cookie. In this example, it is D:/download/cookie.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.

  2. Press F6 or click Run on the Run tab to execute the Job.

    Then, go to the local directory D:/download to check the downloaded file.