Mapping data sources using inner join
In this example, there are two data sources: a text file named in.txt and a table named person in a Mysql database. The data of each data source are read as follows:
id;name;sex 1;Shong;boy 2;Ross;boy 3;Sabrina;girl 4;Elisa;girl
This example uses a tMap component to:
- read the data from the in.txt file as a main flow,
- do an inner join with the data from the person table based on the
idcolumn from the file and the
user_idcolumn from the table.
The join will get rows, including the external columns from the lookup table,
id already exists in the table, and will reject rows if the
id does not exist in the table.
Configuring the components
In this Job, you will perform an inner join between the two incoming data flow, generate two output tables, one table for the matched rows, and another for the unmatched rows.
This Job uses the following components:
- a tFileInputDelimited component to read data from in.txt file,
- a tMysqlInputDelimited component to read data from person table,
- a tMap component to perform an inner join and generate two output tables,
- and two tLogRow components to print output data on the console.
Double-click the tFileInputDelimited to configure its
- Fill the field File name/Stream with the path of the in.txt file.
- Fill respectively the field Field Separator with ";" and the field Header with 1.
- Create a three column schema with a column id of Integer type , a column name of String type and a column sex of String type.
Double-click the tMysqlInput to configure its
Basic settings as below.
- Fill the fields to access the database and the table created for the procedure.
Enter the following query in the Query
"select user_id, name, country, email from person"
Create a four column schema with a column id of
Integer type and a column
name, country and
email of String type.
- Set the DB type of the column id to INT and DB type of the three columns of Integer type to VARCHAR check.
- Make sure that the Nullable check box is selected for these three columns.
- Set the length of the id and name columns to 10 and 11 and their precision to 0.
- Set the id column as key column.
- Double-click the two tLogRow components and select the Table (print values in cells of a table) check box.
Doing an inner join with the tMap component
Double-click the tMap component to open its editor.
- Set an explicit join between the main flow Row1 and the lookup flow Row2 by dropping the id column of the Row1 table to the id column of the Row2 table.
- Perform an inner join between the two incoming data flow, and generate two output tables, one table for the matched rows, and another for the unmatched rows.
Drag the columns id, name,
sex from the Row1 table and
email from the row2 table to
the matched table as below.
- Drag the id, name and sex columns from the Row1 table to the unmatched table.
In the unmatched table, set the Catch lookup
Inner join reject option to true to get
the rejected rows as below.
- Click Apply and accept propagation when prompted.
Executing the Job
The results show that all matched rows are output into the first tLogRow and all unmatched rows are output into the second tLogRow.