Scenario 1: Reading data from the cache memory for high-speed data access - 6.3

Talend Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

The following Job reads from the cache memory a huge amount of data loaded by two tHashOutput components and pass it to a tFileOutputDelimited. The goal of this scenario is to show the speed at which mass data is read and written. In practice, data feed generated in this way can be used as lookup table input for some use cases where a big amount of data needs to be referenced.

Dropping and linking the components

  1. Drag and drop the following components from the Palette to the workspace: tFixedFlowInput (X2), tHashOutput (X2), tHashInput and tFileOutputDelimited.

  2. Connect the first tFixedFlowInput to the first tHashOutput using a Row > Main link.

  3. Connect the second tFixedFlowInput to the second tHashOutput using a Row > Main link.

  4. Connect the first subjob (from tFixedFlowInput_1) to the second subjob (to tFixedFlowInput_2) using an OnSubjobOk link.

  5. Connect tHashInput to tFileOutputDelimited using a Row > Main link.

  6. Connect the second subjob to the last subjob using an OnSubjobOk link.

Configuring the components

Configuring data inputs and hash cache
  1. Double-click the first tFixedFlowInput component to display its Basic settings view.

  2. Select Built-In from the Schema drop-down list.

    Note

    You can select Repository from the Schema drop-down list to fill in the relevant fields automatically if the relevant metadata has been stored in the Repository. For more information about Metadata, see the Talend Studio User Guide.

  3. Click Edit schema to define the data structure of the input flow. In this case, the input has two columns: ID and ID_Insurance, and then click OK to close the dialog box.

  4. Fill in the Number of rows field to specify the entries to output, e.g. 50000.

  5. Select the Use Single Table check box. In the Values table and in the Value column, assign values to the columns, e.g. 1 for ID and 3 for ID_Insurance.

  6. Perform the same operations for the second tFixedFlowInput component, with the only difference in the values. That is, 2 for ID and 4 for ID_Insurance in this case.

  7. Double-click the first tHashOutput to display its Basic settings view.

  8. Select Built-In from the Schema drop-down list and click Sync columns to retrieve the schema from the previous component. Select Keep all from the Keys management drop-down list and keep the Append check box selected.

  9. Perform the same operations for the second tHashOutput component, and select the Link with a tHashOutput check box.

Configuring data retrieval from hash cache and data output
  1. Double-click tHashInput to display its Basic settings view.

  2. Select Built-In from the Schema drop-down list. Click Edit schema to define the data structure, which is the same as that of tHashOutput.

  3. Select tHashOutput_1 from the Component list drop down list.

  4. Double-click tFileOutputDelimited to display its Basic settings view.

  5. Select Built-In from the Property Type drop-down list. In the File Name field, enter the full path and name of the file, e.g. "E:/Allr70207V5.0/Talend-All-r70207-V5.0.0NB/workspace/out.csv".

  6. Select the Include Header check box and click Sync columns to retrieve the schema from the previous component.

Saving and executing the Job

  1. Press Ctrl+S to save the Job.

  2. Press F6, or click Run on the Run tab to execute the Job.

    You can find that mass entries are written and read very rapidly.