The tMap component is configured to join the
movie data and the director data.
Once the movie data and the director data are loaded into the Job, you need
to configure the tMap component to join them to
produce the output you expect.
Procedure
-
Double-click tMap to open its
Map Editor view.
-
Drop the movieID column, the
title column, the releaseYear column and the url column from the left side onto each of the output flow
table.
On the input side (left side) of the Map Editor, each of the two tables represents one of the
input flow, the upper one for the main flow and the lower one for the lookup
flow.
On the output side (right side), the two tables
represent the output flows that you named to out1 and reject when you linked tMap to the two tFileOutputParquet files in
Dropping and linking Spark components.
-
On the input side, drop the directorID column from the main flow table to the Expr.key column of the ID row in the lookup flow table. This way, the join key between
the main flow and the lookup flow is defined.
-
Drop the directorID column
from the main flow table to the reject table
on the output side and drop the Name column
from the lookup flow table to the out1
table.
From the Schema editor
view in the lower part of the editor, you can see the schemas on both sides
have been automatically completed.
-
On the lookup flow table, click the
button to
display the settings panel for the join operation.
-
In the Join model row, click
the Value column and click the [...] button that is displayed.
The Options window is
displayed.
-
Select Inner join in order to
output only the records that contain join keys that exist in both the main flow
and lookup flow.
-
In the Match Model row, repeat the operations to select
All matches.
-
On the reject output flow
table, click the
button to open the setting panel.
-
In the Catch Lookup inner join
reject row, select true to
output the records that are rejected by the inner join performed on the input
side.
-
Click Apply, then click
OK to validate these changes and accept
the propagation prompted by the pop-up dialog box.
Results
The transformation is now configured to complete the movie data with the
names of their directors and write the movie records that do not contain any
director data into a separate data flow.