Scenario: Identifying server locations based on their IP addresses - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

This scenario describes a four-component Job that checks the server IP addresses listed in the main input file against a list of IP ranges given in a lookup file to identify the hosting country for each server.

Setting up the Job

The Job requires two tFileInputDelimited components, a tIntervalMatch component and a tLogRow component.

  1. Drop the components onto the design workspace.

  2. Connect the components using Row > Main connection.

    Note that the connection from the second tFileInputDelimited component to the tIntervalMatch component will appear as a Lookup connection.

Configuring the components

  1. Double-click the first tFileInputDelimited component to open its Basic settings view.

  2. Browse to the file to be used as the main input, which provides a list of servers and their IP addresses:

    Server;IP
    Server1;057.010.010.010
    Server2;001.010.010.100
    Server3;057.030.030.030
    Server4;053.010.010.100
  3. Click the [...] button next to Edit schema to open the [Schema] dialog box and define the input schema. According to the input file structure, the schema is made of two columns, respectively Server and IP, both of type String. Then click OK to close the dialog box.

  4. Define the number of header rows to be skipped, and keep the other settings as they are.

  5. Define the properties of the second tFileInputDelimited component similarly.

    The file to be used as the input to the lookup flow in this example lists some IP address ranges and the corresponding countries:

    StartIP;EndIP;Country
    001.000.000.000;001.255.255.255;USA
    002.006.190.056;002.006.190.063;UK
    011.000.000.000;011.255.255.255;USA
    057.000.000.000;057.255.255.255;France
    012.063.178.060;012.063.178.063;Canada
    053.000.000.000;053.255.255.255;Germany

    Accordingly, the schema of the lookup flow should have the following structure:

  6. Double-click the tIntervalMatch component to open its Basic settings view.

  7. From the Search Column list, select the main flow column containing the values to be matched with the range values. In this example, we want to match the servers' IP addresses with the range values from the lookup flow.

  8. From the Column (LOOKUP) list, select the lookup column that holds the values to be returned. In this example, we want to get the names of countries where the servers are hosted.

  9. Set the min and max lookup columns corresponding to the range bounds defined in the lookup schema, StartIP and EndIP respectively in this example.

Executing the Job

  • Press Ctrl+S to save your Job and press F6 to run it.

    The name of the country where each server is hosted is displayed next to the IP address.