Scenario 1: Fetching data through HTTP - 6.3

Talend Open Studio for Big Data Components Reference Guide

Talend Open Studio for Big Data
Data Governance
Data Quality and Preparation
Design and Development
Talend Studio

This scenario describes a three-component Job which retrieves a file from an HTTP website, reads data from the fetched file and displays the data on the console.

Dropping and linking components

  1. Drop a tFileFetch, a tFileInputDelimited and a tLogRow onto your design workspace.

  2. Link tFileFetch to tFileInputDelimited using a Trigger > On Subjob Ok or On Component Ok connection.

  3. Link tFileInputDelimited to tLogRow using a Row > Main connection.

Configuring the components

  1. Double-click tFileFetch to open its Basic settings view.

  2. Select the protocol you want to use from the list. Here, http is selected.

  3. In the URI field, type in the URI where the file to be fetched can be retrieved from. You can paste the URI directly in your browser to view the data in the file.

  4. In the Destination directory field, browse to the folder where the fetched file is to be stored. In this example, it is D:/Output.

  5. In the Destination filename field, type in a new name for the file if you want it to be changed. In this example, new.txt.

  6. If needed, select the Add header check box and define one or more HTTP request headers as fetch conditions. For example, to fetch the file only if it has been modified since 19:43:31 GMT, October 29, 1994, fill in the Name and Value fields with "If-Modified-Since" and "Sat, 29 Oct 1994 19:43:31 GMT" respectively in the Headers table. For details about HTTP request header definitions, see Header Field Definitions.

  7. Double-click tFileInputDelimited to open its Basic settings view.

  8. In the File name field, type in the full path to the fetched file which had been stored locally.

  9. Click the [...] button next to Edit schema to open the [Schema] dialog box. In this example, add one column output to store the data from the fetched file.

  10. Leave other settings as they are.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.

  2. Press F6 or click Run on the Run tab to execute the Job.

    The data of the fetched file is displayed on the console.