The tPigMap component is configured to join the movie data and the director data.
Once the movie data and the director data are loaded into the Job, you need to configure the tPigMap component to join them to produce the output you expect.
Double-click tPigMap to open its
Map Editor view.
Drop the movieID column, the
title column, the releaseYear column and the url column from the left side onto each of the output flow
On the input side (left side) of the Map Editor, each of the two tables represents one of the input flow, the upper one for the main flow and the lower one for the lookup flow.
On the output side (right side), the two tables represent the output flows that you named to out1 and reject when you linked tPigMap to tPigStoreResult in Dropping and linking components.
On the input side, drop the directorID column from the main flow table to the Expr.key column of the ID row in the lookup flow table.
This way, the join key between the main flow and the lookup flow is defined.
Drop the directorID column
from the main flow table to the reject table
on the output side and drop the Name column
from the lookup flow table to the out1
The configuration in the previous two steps describes how the columns of the input data are mapped to the columns of the output data flow.
From the Schema editor view in the lower part of the editor, you can see the schemas on both sides have been automatically completed.
- On the out1 output flow table, click the button to display the editing field for the filter expression.
row1.directorId is not null
This allows tPigMap to output only the movie records in each of which the directorID field is not empty. A record with an empty directorID field is filtered out.
- On the reject output flow table, click the button to open the settings panel.
- In the Catch Output Reject row, select true to output the records with empty directorID fields in the reject flow.
- Click Apply, then click OK to validate these changes and accept the propagation prompted by the pop-up dialog box.
The transformation is now configured to complete the movie data with the names of their directors and write the movie records that do not contain any director data into a separate data flow.