tIntervalMatch - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

tIntervalMatch properties

Component family

Data Quality

 

Function

tIntervalMatch receives a main flow and aggregates it based on join to a lookup flow. Then it matches a specified value to a range of values and returns related information.

Purpose

Helps to return a value based on a Join relation.

Basic settings

Schema and Edit schema

A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. The schema is either Built-In or stored remotely in the Repository.

Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.

 

 

Built-in: The schema will be created and stored locally for this component only. Related topic: see Talend Studio User Guide.

 

 

Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job flowcharts. Related topic: see Talend Studio User Guide.

  

Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this option to view the schema only.

  • Change to built-in property: choose this option to change the schema to Built-in for local changes.

  • Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the [Repository Content] window.

 

Search Column

Select the main flow column containing the values to be matched with a range of values

 

Column (LOOKUP)

Select the lookup flow column containing the values to be returned when the Join is ok.

 

Lookup Column (min) / Include the bound (min)

Select the column containing the minimum value of the range. Select the check box to include the minimum value of the range in the match.

 

Lookup Column (max) / Include the bound (max)

Select the column containing the maximum value of the range. Select the check box to include the maximum value of the range in the match.

Advanced settings

tStatCatcher Statistics

Select this check box to collect log data at the component level.

Global Variables

NB_LINE: the number of rows read by an input component or transferred to an output component. This is an After variable and it returns an integer.

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

This component handles flow of data therefore it requires input and output, hence is defined as an intermediary step.

Log4j

If you are using a subscription-based version of the Studio, the activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User Guide.

For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.

Limitation

n/a

Scenario: Identifying server locations based on their IP addresses

This scenario describes a four-component Job that checks the server IP addresses listed in the main input file against a list of IP ranges given in a lookup file to identify the hosting country for each server.

Setting up the Job

The Job requires two tFileInputDelimited components, a tIntervalMatch component and a tLogRow component.

  1. Drop the components onto the design workspace.

  2. Connect the components using Row > Main connection.

    Note that the connection from the second tFileInputDelimited component to the tIntervalMatch component will appear as a Lookup connection.

Configuring the components

  1. Double-click the first tFileInputDelimited component to open its Basic settings view.

  2. Browse to the file to be used as the main input, which provides a list of servers and their IP addresses:

    Server;IP
    Server1;057.010.010.010
    Server2;001.010.010.100
    Server3;057.030.030.030
    Server4;053.010.010.100
  3. Click the [...] button next to Edit schema to open the [Schema] dialog box and define the input schema. According to the input file structure, the schema is made of two columns, respectively Server and IP, both of type String. Then click OK to close the dialog box.

  4. Define the number of header rows to be skipped, and keep the other settings as they are.

  5. Define the properties of the second tFileInputDelimited component similarly.

    The file to be used as the input to the lookup flow in this example lists some IP address ranges and the corresponding countries:

    StartIP;EndIP;Country
    001.000.000.000;001.255.255.255;USA
    002.006.190.056;002.006.190.063;UK
    011.000.000.000;011.255.255.255;USA
    057.000.000.000;057.255.255.255;France
    012.063.178.060;012.063.178.063;Canada
    053.000.000.000;053.255.255.255;Germany

    Accordingly, the schema of the lookup flow should have the following structure:

  6. Double-click the tIntervalMatch component to open its Basic settings view.

  7. From the Search Column list, select the main flow column containing the values to be matched with the range values. In this example, we want to match the servers' IP addresses with the range values from the lookup flow.

  8. From the Column (LOOKUP) list, select the lookup column that holds the values to be returned. In this example, we want to get the names of countries where the servers are hosted.

  9. Set the min and max lookup columns corresponding to the range bounds defined in the lookup schema, StartIP and EndIP respectively in this example.

Executing the Job

  • Press Ctrl+S to save your Job and press F6 to run it.

    The name of the country where each server is hosted is displayed next to the IP address.