Scenario 2: Reading data from a remote file in streaming mode - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

This scenario describes a four component Job used to fetch data from a voluminous file almost as soon as it has been read. The data is displayed in the Run view. The advantage of this technique is that you do not have to wait for the entire file to be downloaded, before viewing the data.

Dropping and linking components

  1. Drop the following components onto the workspace: tFileFetch, tSleep, tFileInputDelimited, and tLogRow.

  2. Connect tSleep and tFileInputDelimited using a Trigger > OnComponentOk link and connect tFileInputDelimited to tLogRow using a Row > Main link.

Configuring the components

  1. Double-click tFileFetch to display the Basic settings tab in the Component view and set the properties.

  2. From the Protocol list, select the appropriate protocol to access the server on which your data is stored.

  3. In the URI field, enter the URI required to access the server on which your file is stored.

  4. Select the Use cache to save the resource check box to add your file data to the cache memory. This option allows you to use the streaming mode to transfer the data.

  5. In the workspace, click tSleep to display the Basic settings tab in the Component view and set the properties.

    By default, tSleep's Pause field is set to 1 second. Do not change this setting. It pauses the second Job in order to give the first Job, containing tFileFetch, the time to read the file data.

  6. In the workspace, double-click tFileInputDelimited to display its Basic settings tab in the Component view and set the properties.

  7. In the File name/Stream field:

    - Delete the default content.

    - Press Ctrl+Space to view the variables available for this component.

    - Select tFileFetch_1_INPUT_STREAM from the auto-completion list, to add the following variable to the Filename field: ((java.io.InputStream)globalMap.get("tFileFetch_1_INPUT_STREAM")).

  8. From the Schema list, select Built-in and click [...] next to the Edit schema field to describe the structure of the file that you want to fetch. The US_Employees file is composed of six columns: ID, Employee, Age, Address, State, EntryDate.

    Click [+] to add the six columns and set them as indicated in the above screenshot. Click OK.

  9. In the workspace, double-click tLogRow to display its Basic settings in the Component view and click Sync Columns to ensure that the schema structure is properly retrieved from the preceding component.

Configuring Job execution and executing the Job

  1. Click the Job tab and then on the Extra view.

  2. Select the Multi thread execution check box in order to run the two Jobs at the same time. Bear in mind that the second Job has a one second delay according to the properties set in tSleep. This option allows you to fetch the data almost as soon as it is read by tFileFetch, thanks to the tFileDelimited component.

  3. Save the Job and press F6 to run it.

    The data is displayed in the console as almost as soon as it is read.

For a scenario concerning the use of dynamic schemas in tFileInputDelimited, see Scenario 4: Writing dynamic columns from a MySQL database to an output file.