tFlowMeterCatcher - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

tFlowMeterCatcher Properties

Component family

Logs & Errors

 

Function

Based on a defined schema, the tFlowMeterCatcher catches the processing volumetric from the tFlowMeter component and passes them on to the output component.

Purpose

Operates as a log function triggered by the use of a tFlowMeter component in the Job.

Basic settings

Schema and Edit Schema

A schema is a row description, it defines the fields to be processed and passed on to the next component. In this particular case, the schema is read-only, as this component gathers standard log information including:

 

 

Moment: Processing time and date

 

 

Pid: Process ID

 

 

Father_pid: Process ID of the father Job if applicable. If not applicable, Pid is duplicated.

 

 

Root_pid: Process ID of the root Job if applicable. If not applicable, pid of current Job is duplicated.

 

 

System_pid: Process id generated by the system

 

 

Project: Project name, the Job belongs to.

 

 

Job: Name of the current Job

 

 

Job_repository_id: ID generated by the application.

 

 

Job_version: Version number of the current Job

 

 

Context: Name of the current context

 

 

Origin: Name of the component if any

 

 

Label: Label of the row connection preceding the tFlowMeter component in the Job, and that will be analyzed for volumetrics.

 

 

Count: Actual number of rows being processed

 

 

Reference: Number of rows passing the reference link.

 

 

Thresholds: Only used when the relative mode is selected in the tFlowMeter component.

Global Variables

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

This component is the start component of a secondary Job which triggers automatically at the end of the main Job.

Log4j

If you are using a subscription-based version of the Studio, the activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User Guide.

For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.

Limitation

The use of this component cannot be separated from the use of the tFlowMeter. For more information, see tFlowMeter

Scenario: Catching flow metrics from a Job

The following basic Job aims at catching the number of rows being passed in the flow processed. The measures are taken twice, once after the input component, that is, before the filtering step and once right after the filtering step, that is, before the output component.

  • Drop the following components from the Palette to the design workspace: tMysqlInput, tFlowMeter (x2), tMap, tLogRow, tFlowMeterCatcher and tFileOutputDelimited.

  • Link components using row main connections and click on the label to give consistent name throughout the Job, such as US_States from the input component and filtered_states for the output from the tMap component, for example.

  • Link the tFlowMeterCatcher to the tFileOutputDelimited component using a row main link also as data is passed.

  • On the tMysqlInput Component view, configure the connection properties as Repository, if the table metadata are stored in the Repository. Or else, set the Type as Built-in and configure manually the connection and schema details if they are built-in for this Job.

  • The 50 States of the USA are recorded in the table states. In order for all 50 entries of the table to get selected, the query to run onto the Mysql database is as follows:

    select * from states.

  • Select the relevant encoding type on the Advanced settings vertical tab.

  • Then select the following component which is a tFlowMeter and set its properties.

  • Select the check box Use input connection name as label, in order to reuse the label you chose in the log output file (tFileOutputDelimited).

  • The mode is Absolute as there is no reference flow to meter against, also no Threshold is to be set for this example.

Note

The Thresholds information is of use within a supervising tool such as Talend Activity Monitoring Console in order to get a proportional representation of the flow process. See Talend Activity Monitoring Console User guide for more information.

  • Then launch the tMap editor to set the filtering properties.

  • For this use case, drag and drop the ID and State columns from the Input area of the tMap towards the Output area. No variable is used in this example.

  • On the Output flow area (labelled filtered_states in this example), click the arrow & plus button to activate the expression filter field.

  • Drag the State column from the Input area (row2) towards the expression filter field and type in the rest of the expression in order to filter the state labels starting with the letter M. The final expression looks like: row2.State.startsWith("M")

  • Click OK to validate the setting.

  • Then select the second tFlowMeter component and set its properties.

  • Select the check box Use input connection name as label.

  • Select Relative as Mode and in the Reference connections list, select US_States as reference to be measured against.

  • Once again, no threshold is used for this use case.

  • No particular setting is required in the tLogRow.

  • Neither does the tFlowMeterCatcher as this component's properties are limited to a preset schema which includes typical log information.

  • So eventually set the log output component (tFileOutputDelimited).

  • Select the Append check box in order to log all tFlowMeter measures.

  • Then save your Job and press F6 to execute it.

The Run view shows the filtered state labels as defined in the Job.

In the delimited csv file, the number of rows shown in column count varies between tFlowMeter1 and tFlowMeter2 as the filtering has then been carried out. The reference column shows also this difference.