Scenario: Verify email addresses against column content and domain names - 6.1

Talend Components Reference Guide

Version
6.1
Language
English (United States)
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance
Data Quality and Preparation
Design and Development

This scenario describes a Job which uses:

  • the tFixedFlowInput component to generate the email addresses to be analyzed,

  • the tverifyEmail component to format the email addresses through Talend email API,

  • the tFileOutputExcel component to output the formatted addresses in an .xls file.

Setting up the Job

  1. Drop the following components from the Palette onto the design workspace: tFixedFlowInput, tVerifyEmail and tFileOutputExcel.

  2. Connect the three components together using the Main links.

Configuring the input component

  1. Double-click tFixedFlowInput to open its Basic settings view in the Component tab.

  2. Create the schema through the Edit Schema button.

    In the open dialog box, click the [+] button and add the columns that will hold input address data. For this example, add firstname, lastname and email.

  3. Click OK.

  4. In the Number of rows field, enter 1.

  5. In the Mode area, select the Use Inline Table option.

  6. In the Inline table, use the [+] button to add lines to the table and then enter the address data you want to analyze.

Verifying and formatting email addresses

  1. Double-click tVerifyEmail to display the Basic settings view and define the component properties.

  2. If required, click Sync columns to retrieve the schema defined in the input component.

  3. Click the Edit schema button to open the schema dialog box.

    tVerifyEmail proposes predefined read-only address columns as shown in the below capture.

    The VerificationLevel column returns the verification status of input email addresses. The SuggestedEmail column returns a suggested content for the email part before the @ sign. This column is shown in the output schema only if you select theUse column content option in the Local Part Options section. For further information about output columns, see tVerifyEmail properties.

  4. Move any of the input columns to the output schema if you want to show them in the verification results, click OK and accept to propagate the changes.

  5. From the Column to validate list, select the email column.

  6. In the LOCAL Part Options section, select the Use column content option.

    In this example, you want to check the email part before the @ sign to see if it starts with the first letter of the first name followed by the family name, all in lower case. If the local part does not match what you have defined, tVerifyEmail will rewrite it by using the parameters you define.

  7. In the DOMAIN Part Options, select:

    • the Check the default Top-level Domains and the following ones check box and define in the table the additional top-level domain against which you want to validate email addresses.

    • the Check domains with a black list check box and define in the Domain list table the domain to consider as black listed.

  8. Select the Check with mail server callback check box to enable the mail server to verify the complete address and accept or reject the email.

Configuring the output component and executing the Job

  1. Double-click the tFileOutputExcel component to display the Basic settings view and define the component properties.

  2. Set the destination file name as well as the sheet name and then select the Define all columns auto size check box.

  3. Save your Job and press F6 to execute it.

    The tVerifyEmail component analyzes email addresses and corrects those that do not match what you have defined in the local and domain part options.

  4. Right-click the output component and select Data Viewer to display the formatted email addresses.

    tVerifyEmail matches input addresses against the rule you set in the LOCAL part options section and the parameters you set for the domain names.

    The VerificationLevel output column returns the status as VALID, INVALID, CORRECTED and REJECTED according to what you set/selected in tVerifyEmail basic settings.

    All email addresses labeled as CORRECTED have a suggested address in the SuggestedEmail output column.