Removing non-matching values - 6.2

Talend Real-time Big Data Platform Studio User Guide

Talend Real-Time Big Data Platform
Talend Studio
Data Quality and Preparation
Design and Development

The email pattern used on the email column showed that some records do not respect the standard email format. You can generate a ready-to-use Job to recuperate the non-matching rows from the column.

To recuperate non-matching email rows:

  1. In the Profiling perspective, click the Analysis Results tab at the bottom of the editor.

  2. In the Pattern Matching results of the email column, right-click the chart bar or the numerical results and select Generate Job.

  3. In the open dialog box, select Generate an ETL Job to handle rows.

    The Integration perspective opens on the generated Job.

    This Job uses the Extract Transform Load process to write in two separate output files the valid/invalid email rows that match/do not match the pattern.

  4. Save the Job and press F6 to execute it.

    The valid and invalid rows of the email column are written in the defined output files.

    You can replace the output files with different Talend components and recuperate the valid/invalid email rows and write them in databases for example.

    You can follow the same procedure to recuperate invalid rows from the postal column.

    For further information on using the Profiling perspective to identify and remove corrupt, incomplete or inaccurate data, see the chapter about data cleansing in Talend Studio User Guide.