Double-click tFileInputDelimited to open its
Basic settings view and define its properties.
- Click the three-dot button next to the File Name field to browse to the file holding the input data.
If needed, set Header, Footer, and Limit.
For this scenario, set Header to 1. Footer and limit for the number of processed rows are not set.
Click Edit schema to open a dialog box where you
can describe the data structure of the source delimited file.
In this scenario, the source schema is made of the following columns: ID, Status, FirstName, Email, City, Initial, and ZipCode.
- Double click tFuzzyUniqRow to display its Basic settings view and define its properties.
In the Key Attribute column, select the check boxes
next to the columns you want to check using the defined matching method,
Firstname, Email, City,
and ZipCode in this example.
In the Matching Type column, set the matching
methods you want to use on each of the selected columns.
In this example, Leveshtein is to be used as the matching method for the FirstName, Email, and ZipCode columns, Double Metaphone is to be used as the matching method for the City column.Then set the minimum and maximum distances for the Levenshtein method. In this method, the distance is the number of character changes (insertion, deletion or substitution) that needs to be carried out in order for the entry to fully match the reference. In this example, we want the min. distance to be 0 and the max. distance to be 2. This will output all entries in the FirstName, Email, and ZipCode columns that exactly match or that have maximum two character changes. There is no minimum nor maximum distance to set for Double Metaphone because this matching method is based on phonetic discrepancies in the input data.
Double click the first tFileOutputExcel to display
its Basic settings view and define its
- Set the destination file name as well as the Sheet name and select the Include header check box.
- Do the same for the second tFileOutputExcel.