Removing non-matching values - 7.3

Talend Data Fabric Getting Started Guide

Version
7.3
Language
English
Operating system
Data Fabric
Product
Talend Data Fabric
Module
Talend Administration Center
Talend DQ Portal
Talend Installer
Talend Runtime
Talend Studio
Content
Data Quality and Preparation > Cleansing data
Data Quality and Preparation > Profiling data
Design and Development
Installation and Upgrade
Last publication date
2023-07-24

The results of the patterns used on the Email and Phone columns show that some records do not respect the standard email and phone formats. Check Showing analysis results for detail.

From the analysis results, you can generate out-of-box Jobs to recuperate the non-matching rows from the columns.

You can follow the same procedure to remove non-matching values from the Email or Phone columns.

Before you begin

  • You have opened the Profiling perspective in the Studio.

  • You have created and executed the column analysis. For further information, see Identifying anomalies in data.

Procedure

  1. Open the column analysis in the Profiling perspective and click Analysis Results at the bottom of the editor.
  2. In the Pattern Matching tables of the Email or Phone column, right-click the results and select Generate Job.

    This example uses the results of the US Phone numbers pattern used on the Phone column.

  3. In the wizard that opens, click Finish to confirm the creation of the Job.

    The Integration perspective opens showing the generated Job, and the Job is listed in the Repository tree view.

    This Job uses the Extract Transform Load process to write in two separate output files the Phone rows that match and do not match the pattern.

    The tMysqlInput is automatically configured according to your connection and tPatternCheck is automatically configured according the column you analyze.

  4. Double-click each of the output component and change the default name or path of the output files, if needed.
  5. Press F6 to execute the Job.

    Matching and non-matching phone numbers are written to two separate output files.

  6. Right-click each of the tFileOutputDelimited components and select Data Viewer to open a view on the data which matches and that which does not match the phone pattern.

Results

You can then design a Job, for example, to standardize the phone numbers which match the pattern and give them the correct international format by using the tStandardizePhoneNumber component.