Removing non-matching values - 7.0

Data Quality Job and Analysis Examples

author
Talend Documentation Team
EnrichVersion
7.0
EnrichProdName
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Open Studio for Data Quality
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Quality and Preparation
EnrichPlatform
Talend Studio
The email pattern used on the email column showed that some records do not respect the standard email format. You can generate a ready-to-use Job to recuperate the non-matching rows from the column.

Procedure

  1. In the Profiling perspective, click the Analysis Results tab at the bottom of the editor.
  2. In the Pattern Matching results of the email column, right-click the chart bar or the numerical results and select Generate Job.
  3. In the open dialog box, select Generate an ETL Job to handle rows.

    The Integration perspective opens on the generated Job.

    This Job uses the Extract Transform Load process to write in two separate output files the valid/invalid email rows that match/do not match the pattern.

  4. Save the Job and press F6 to execute it.

Results

The valid and invalid rows of the email column are written in the defined output files.

You can replace the output files with different Talend components and recuperate the valid/invalid email rows and write them in databases for example.

What to do next

You can follow the same procedure to recuperate invalid rows from the postal column.

For further information on using the Profiling perspective to identify and remove corrupt, incomplete or inaccurate data, see the Data Cleansing chapter in Talend Studio User Guide.