The tMap component is configured to join the movie data and the director data.
Once the movie data and the director data are loaded into the Job, you need to configure the tMap component to join them to produce the output you expect.
Double-click tMap to open its
Map Editor view.
Drop the movieID column, the
title column, the releaseYear column and the url column from the left side onto each of the output flow
On the input side (left side) of the Map Editor, each of the two tables represents one of the input flow, the upper one for the main flow and the lower one for the lookup flow.
On the output side (right side), the two tables represent the output flows that you named to out1 and reject when you linked tMap to tHDFSOutput and tFileOutputDelimited in Dropping and linking MapReduce components.
- On the input side, drop the directorID column from the main flow table to the Expr.key column of the ID row in the lookup flow table. This way, the join key between the main flow and the lookup flow is defined.
Drop the directorID column
from the main flow table to the reject table
on the output side and drop the Name column
from the lookup flow table to the out1
From the Schema editor view in the lower part of the editor, you can see the schemas on both sides have been automatically completed.
- On the lookup flow table, click the button to display the settings panel for the join operation.
In the Join model row, click
the Value column and click the [...] button that is displayed.
The Options window is displayed.
- Select Inner join in order to output only the records that contain join keys that exist in both the main flow and lookup flow.
- On the reject output flow table, click the button to open the setting panel.
- In the Catch Lookup inner join reject row, select true to output the records that are rejected by the inner join performed on the input side.
- Click Apply, then click OK to validate these changes and accept the propagation prompted by the pop-up dialog box.
The transformation is now configured to complete the movie data with the names of their directors and write the movie records that do not contain any director data into a separate data flow.