Doing an inner join using a tMap component

author
Shicong Hong
EnrichVersion
6.4
6.3
6.2
6.1
6.0
EnrichProdName
Talend Big Data Platform
Talend Big Data
Talend Data Integration
Talend Data Services Platform
Talend ESB
Talend Open Studio for Data Integration
Talend Data Fabric
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
Talend MDM Platform
Talend Data Management Platform
Talend Open Studio for Big Data
task
Data Quality and Preparation > Third-party systems > Processing components (Integration) > tMap
Design and Development > Third-party systems > Processing components (Integration) > tMap
Data Governance > Third-party systems > Processing components (Integration) > tMap
EnrichPlatform
Talend Studio

Mapping data sources using inner join

You often need to perform joins between data sources to get the desired results. tMap is a transformation component that allows you to do joins. Both inner joins and outer joins are supported. In this procedure, you will learn how to do an inner join on tMap and manage the matched and rejected data.

In this example, there are two data sources: a text file named in.txt and a table named person in a Mysql database. The data of each data source are read as follows:

in.txt

id;name;sex
1;Shong;boy
2;Ross;boy
3;Sabrina;girl
4;Elisa;girl
Table 1. person table
user_id name country email
1 Shong CN Shong@talend.com
4 Elisa FR Elisa@talend.com

This example uses a tMap component to:

  • read the data from the in.txt file as a main flow,
  • do an inner join with the data from the person table based on the id column from the file and the user_id column from the table.

The join will get rows, including the external columns from the lookup table, if the id already exists in the table, and will reject rows if the id does not exist in the table.

Here is the expected output:
Table 2. Matched rows:
id name sex email
1 Shong boy Shong@talend.com
4 Elisa girl Elisa@talend.com
Table 3. Unmatched rows:
id name sex
2 Ross boy
3 Sabrina girl

Configuring the components

In this Job, you will perform an inner join between the two incoming data flow, generate two output tables, one table for the matched rows, and another for the unmatched rows.

This Job uses the following components:

  • a tFileInputDelimited component to read data from in.txt file,
  • a tMysqlInputDelimited component to read data from person table,
  • a tMap component to perform an inner join and generate two output tables,
  • and two tLogRow components to print output data on the console.

Procedure

  1. Double-click the tFileInputDelimited to configure its Basic settings.
    1. Fill the field File name/Stream with the path of the in.txt file.
    2. Fill respectively the field Field Separator with ";" and the field Header with 1.
    3. Create a three column schema with a column id of Integer type , a column name of String type and a column sex of String type.
  2. Double-click the tMysqlInput to configure its Basic settings as below.
    1. Fill the fields to access the database and the table created for the procedure.
    2. Enter the following query in the Query box:
      "select user_id, name, country, email from person"
  3. Create a four column schema with a column id of Integer type and a column name, country and email of String type.
    1. Set the DB type of the column id to INT and DB type of the three columns of Integer type to VARCHAR check.
    2. Make sure that the Nullable check box is selected for these three columns.
    3. Set the length of the id and name columns to 10 and 11 and their precision to 0.
    4. Set the id column as key column.
  4. Double-click the two tLogRow components and select the Table (print values in cells of a table) check box.

Doing an inner join with the tMap component

Double-click the tMap component to open its editor.

Procedure

  1. Set an explicit join between the main flow Row1 and the lookup flow Row2 by dropping the id column of the Row1 table to the id column of the Row2 table.
  2. Perform an inner join between the two incoming data flow, and generate two output tables, one table for the matched rows, and another for the unmatched rows.
  3. Drag the columns id, name, sex from the Row1 table and email from the row2 table to the matched table as below.
  4. Drag the id, name and sex columns from the Row1 table to the unmatched table.
  5. In the unmatched table, set the Catch lookup Inner join reject option to true to get the rejected rows as below.
  6. Click Apply and accept propagation when prompted.

Executing the Job

Procedure

Execute the Job. The console shows the following results:

The results show that all matched rows are output into the first tLogRow and all unmatched rows are output into the second tLogRow.