Storing the result of the input flow in a temporary location - Cloud - 8.0

Technical

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Technical components
Data Quality and Preparation > Third-party systems > Technical components
Design and Development > Third-party systems > Technical components
Last publication date
2024-02-20

In this Job, the results of the input flow are stored in a temporary location (either in a file or in memory (cache)) to reduce the processing time when processing large sets of data or if your input flow is complex.

This Job will use the following components:

  • A tFileInputDelimited, a tReplicate, and two tMap components to create two input flows.
  • Two tHashOutput and tHashinput components to store and use the results from a temporary location.
  • A third tMap component and a tLogRow to print the results in the console.

Procedure

  1. Create two input flows as shown above adding the tFileInputDelimited, the Replicate, the tMap and the tHashOutput components on the workspace and creating Row > Main links between.
    Note: tHashInput and tHashOutput are components from the Technical family and are hidden by default. For more informationn, see Where can I find the tHashInput and tHashOutput components.
  2. Either use two tFileOutputDelimited components or tHashOutput components to store the result from tMap_1 or tMap_2 in a place.
  3. Then read the data in the next subJob, from the temporary file using a tFileInputDelimited component or from the memory using a tHashInput component. The Job example above caches the result into memory.
  4. In the Basic settings view of tHashIntput_1, select tHashOutput_1 from the Component list drop-down list.

    This configuration links tHashInput_1 to tHashOutput_1.

    Tip: tHashOutput_1 is used to cache the result out from tMap_1 into memory. tHashOutput_2 is used to cache the result out from tMap_2 into memory. In order for the data to be retrieved from the memory, the tHashInput_1 component must be linked with the tHashOutput_1 component and the tHashInput_2 with tHashOuput_2, respectively.
  5. In the Basic settings view of tHashIntput_2, select tHashOutput_2 from the Component list drop-down list.

    This configuration links tHashInput_2 to tHashOutput_2.

  6. Then read the data in the next subJob, from the temporary file using a tFileInputDelimited component or from the memory using a tHashInput component. The Job example above caches the result into memory.