Double-click tBlockedFuzzyJoin to display its
Basic settings view
and define its properties.
Click the Edit schema button
to open a dialog box. Here you can define the data you want
to pass to the output components.
In this example we want to pass the four input columns to the output components in addition to the new column ref_firstname.
- Click OK to close the dialog box and proceed to the next step.
- In the Key definition area of the Basic settings view of tBlockedFuzzyJoin, click the plus button to add two columns to the list.
- Select the input columns and the output columns you want to do the fuzzy matching on from the Input key attribute and Lookup key attribute lists respectively, grp and firstname in this example.
- Click in the first cell of the Matching type column and select from the list the method to be used to check the incoming data against the reference data, Exact match in this example. There is no minimum nor maximum distance to set.
- Set the matching type for the second column, Levenshtein in this example.
- Then set the minimum and maximum distances. In this method, the distance is the number of character changes (insertion, deletion or substitution) that needs to be carried out in order for the entry to fully match the reference. In this example, we want the min. distance to be 0 and the max. distance to be 2. This will output all entries in the firstname column that exactly match or that have maximum two character changes.