You would want to match the content of the email column against a standard email
format and the postal column against a standard US ZIP code format.
This will define the content, structure, and quality of emails and ZIP codes and give a
percentage of the data that match the standard formats and the data that does not
match.
Procedure
-
In the Analyzed Columns view, click the
icon next to
email to open the Pattern Selector dialog
box.
-
Expand , select the Email Address
check box and click OK to close the dialog
box.
-
Click the
icon next to the Email Address indicator and set
98.0 in the Lower threshold (%) field.
If the number of the records that match the patterns is
fewer than 98%, it will be written in red in the analysis results.
-
Do the same to add to the postal column the
US Zipcode Validation pattern from the
address folder.
For more information on pattern types and their usage when analyzing data,
see Patterns in the Talend Studio
User Guide.