In this example, there are two data sources: a text file named in.txt and a table named person in a Mysql database. The data of each data source are read as follows:
id;name;sex 1;Shong;boy 2;Ross;boy 3;Sabrina;girl 4;Elisa;girl
This example reads the data from the in.txt file as a main flow, and then does an inner join with the data read from the person table on a tMap component based on the id column from the file and the user_id column from the table. The join will get rows, including the external columns from the lookup table, if the id already exists in the table, and will reject rows if the id does not exist in the table.
Here is the expected output:
Configuring the components
In this Job, you will perform an inner join between the two incoming data flow, generate two output tables, one table for the matched rows, and another for the unmatched rows.
This Job will use the following components:
- a tFileInputDelimited component to read data from in.txt file,
- a tMysqlInputDelimited component to read data from person table,
- a tMap to perform an inner join and generate two output tables,
- and two tLogRow components to print output data on the console.
Double-click the tFileInputDelimited to configure its
- Fill the field File name/Stream with the path of the in.txt file.
- Fill respectively the field Field Separator with ";" and the field Header with 1.
- Create a three column schema with a column id of Integer type , a column name of String type and a column sex of String type.
Double-click the tMysqlInput to configure its
Basic settings as below.
- Fill the fields to access the database and the table created for the procedure.
Enter the following query in the Query
"select user_id, name, country, email from person"
Create a four column schema with a column id of
Integer type and a column
name, country and
email of String type.
- Set the DB type of the column id to INT and DB type of the three columns of Integer type to VARCHAR check.
- Make sure that the Nullable check box is selected for these three columns.
- Set the length of the id and name columns to 10 and 11 and their precision to 0.
- Set the id column as key column.
- Double-click the two tLogRow components to check the Table (print values in cells of a table) box.
Doing an inner join with the tMap component
When double-clicking the tMap component, the following schema will display:
- Set an explicit join between the main flow Row1 and the lookup flow Row2 by dropping the id column of the Row1 table to the id column of the Row2 table.
- Define this join as an inner join by clicking the tMap settings red button, clicking the Value field for Join Model, clicking the small button that appears in the field, and selecting Inner Join from the [Options] dialog box.
Drag the columns id, name,
sex from the Row1 table and
email from the row2 table to
the matched table as below.
- Drag the following columns from the Row1 table to the unmatched table: id, name and sex.
In the unmatched table, set the Catch lookup
Inner join reject option as true to get
the rejected rows as below.
- Click Apply and accept the propagation prompted by the pop-up dialog box.