tHashInput - 6.3

Talend Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

The components of the Technical family are normally hidden from the Palette by default. For more information about how to show them on the Palette, see Talend Studio User Guide.

Function

tHashInput reads from the cache memory data loaded by tHashOutput to offer high-speed data stream.

Purpose

This component reads from the cache memory data loaded by tHashOutput to offer high-speed data feed, facilitating transactions involving a large amount of data.

tHashInput Properties

Component family

Technical

 

Basic settings

Schema and Edit schema

A schema is a row description, it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository.

Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this option to view the schema only.

  • Change to built-in property: choose this option to change the schema to Built-in for local changes.

  • Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the [Repository Content] window.

This component offers the advantage of the dynamic schema feature. This allows you to retrieve unknown columns from source files or to copy batches of columns from a source without mapping each column individually. For further information about dynamic schemas, see Talend Studio User Guide.

This dynamic schema feature is designed for the purpose of retrieving unknown columns of a table and is recommended to be used for this purpose only; it is not recommended for the use of creating tables.

 

 

Built-in: The schema is created and stored locally for this component only. Related topic: see the Talend Studio User Guide.

 

 

Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: see the Talend Studio User Guide.

 

Link with a tHashOutput

Select this check box to connect to a tHashOutput component. It is always selected by default.

 

Component list

Drop-down list of available tHashOutput components.

 

Clear cache after reading

Select this check box to clear the cache after reading the data loaded by a certain tHashOutput component. This way, the following tHashInput components, if any, will not be able to read the cached data loaded by that tHashOutput component.

Advanced settings

tStatCatcher Statistics

Select this check box to collect log data at the component level.

Global Variables

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

NB_LINE: the number of rows processed. This is an After variable and it returns an integer.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

This component is used along with tHashOutput. It reads from the cache memory data loaded by tHashOutput. Together, these twin components offer high-speed data access to facilitate transactions involving a massive amount of data.

Limitation

n/a

Scenario 1: Reading data from the cache memory for high-speed data access

The following Job reads from the cache memory a huge amount of data loaded by two tHashOutput components and pass it to a tFileOutputDelimited. The goal of this scenario is to show the speed at which mass data is read and written. In practice, data feed generated in this way can be used as lookup table input for some use cases where a big amount of data needs to be referenced.

Dropping and linking the components

  1. Drag and drop the following components from the Palette to the workspace: tFixedFlowInput (X2), tHashOutput (X2), tHashInput and tFileOutputDelimited.

  2. Connect the first tFixedFlowInput to the first tHashOutput using a Row > Main link.

  3. Connect the second tFixedFlowInput to the second tHashOutput using a Row > Main link.

  4. Connect the first subjob (from tFixedFlowInput_1) to the second subjob (to tFixedFlowInput_2) using an OnSubjobOk link.

  5. Connect tHashInput to tFileOutputDelimited using a Row > Main link.

  6. Connect the second subjob to the last subjob using an OnSubjobOk link.

Configuring the components

Configuring data inputs and hash cache
  1. Double-click the first tFixedFlowInput component to display its Basic settings view.

  2. Select Built-In from the Schema drop-down list.

    Note

    You can select Repository from the Schema drop-down list to fill in the relevant fields automatically if the relevant metadata has been stored in the Repository. For more information about Metadata, see the Talend Studio User Guide.

  3. Click Edit schema to define the data structure of the input flow. In this case, the input has two columns: ID and ID_Insurance, and then click OK to close the dialog box.

  4. Fill in the Number of rows field to specify the entries to output, e.g. 50000.

  5. Select the Use Single Table check box. In the Values table and in the Value column, assign values to the columns, e.g. 1 for ID and 3 for ID_Insurance.

  6. Perform the same operations for the second tFixedFlowInput component, with the only difference in the values. That is, 2 for ID and 4 for ID_Insurance in this case.

  7. Double-click the first tHashOutput to display its Basic settings view.

  8. Select Built-In from the Schema drop-down list and click Sync columns to retrieve the schema from the previous component. Select Keep all from the Keys management drop-down list and keep the Append check box selected.

  9. Perform the same operations for the second tHashOutput component, and select the Link with a tHashOutput check box.

Configuring data retrieval from hash cache and data output
  1. Double-click tHashInput to display its Basic settings view.

  2. Select Built-In from the Schema drop-down list. Click Edit schema to define the data structure, which is the same as that of tHashOutput.

  3. Select tHashOutput_1 from the Component list drop down list.

  4. Double-click tFileOutputDelimited to display its Basic settings view.

  5. Select Built-In from the Property Type drop-down list. In the File Name field, enter the full path and name of the file, e.g. "E:/Allr70207V5.0/Talend-All-r70207-V5.0.0NB/workspace/out.csv".

  6. Select the Include Header check box and click Sync columns to retrieve the schema from the previous component.

Saving and executing the Job

  1. Press Ctrl+S to save the Job.

  2. Press F6, or click Run on the Run tab to execute the Job.

    You can find that mass entries are written and read very rapidly.

Scenario 2: Clearing the memory before loading data to it in case an iterator exists in the same subjob

In this scenario, the usage of the Append option of tHashOutput is demonstrated as it helps remove repetitive or unwanted data in case an iterator exists in the same subjob as tHashOutput.

To build the Job, do the following:

Dropping and linking the components

  1. Drag and drop the following components from the Palette to the workspace: tLoop, tFixedFlowInput, tHashOutput, tHashInput and tLogRow.

  2. Connect tLoop to tFixedFlowInput using a Row > Iterate link.

  3. Connect tFixedFlowInput to tHashOutput using a Row > Main link.

  4. Connect tHashInput to tLogRow using a Row > Main link.

  5. Connect tLoop to tHashInput using an OnSubjobOk link.

Configuring the components

Configuring data input and hash cache
  1. Double-click the tLoop component to display its Basic settings view.

  2. Select For as the loop type. Type in 1, 2 1 in the From, To and Step fields respectively. Keep the Values are increasing check box selected.

  3. Double-click the tFixedFlowInput component to display its Basic settings view.

  4. Select Built-In from the Schema drop-down list.

    Note

    You can select Repository from the Schema drop-down list to fill in the relevant fields automatically if the relevant metadata has been stored in the Repository. For more information about Metadata, see the Talend Studio User Guide.

  5. Click Edit schema to define the data structure of the input flow. In this case, the input has one column: Name.

  6. Click OK to close the dialog box.

  7. Fill in the Number of rows field to specify the entries to output, for example 1.

  8. Select the Use Single Table check box. In the Values table, assign a value to the Name field, e.g. Marx.

  9. Double-click tHashOutput to display its Basic settings view.

  10. Select Built-In from the Schema drop-down list and click Sync columns to retrieve the schema from the previous component. Select Keep all from the Keys management drop-down list and deselect the Append check box.

Configuring data retrieval from hash cache and data output
  1. Double-click tHashInput to display its Basic settings view.

  2. Select Built-In from the Schema drop-down list. Click Edit schema to define the data structure, which is the same as that of tHashOutput.

  3. Select tHashOutput_2 from the Component list drop-down list.

  4. Double-click tLogRow to display its Basic settings view.

  5. Select Built-In from the Schema drop-down list and click Sync columns to retrieve the schema from the previous component. In the Mode area, select Table (print values in cells of a table).

Saving and executing the Job

  1. Press Ctrl+S to save the Job.

  2. Press F6, or click Run on the Run tab to execute the Job.

    You can find that only one row was output although two rows were generated by tFixedFlowInput.