Identifying anomalies in data

Talend Data Fabric Getting Started Guide

author
Talend Documentation Team
EnrichVersion
6.4
EnrichProdName
Talend Data Fabric
task
Design and Development
Installation and Upgrade
Data Quality and Preparation > Profiling data
Data Quality and Preparation > Cleansing data

The use case explains how to use the Profiling perspective of the studio to analyze customer email addresses and phone numbers. It uses out-of-box indicators and patterns on the columns and shows the matching and non-matching address data.

Profiling Jobs are then generated on the analysis results to clean customer data and monitor its evolution.

You can then use the Data Explorer perspective to browse the non-matching data.

The sequence of profiling and cleansing customer data involves the following steps:

Procedure

  1. Create a column analysis on customer email addresses and phone numbers. For further information, see Defining a column analysis.
  2. Connect to the database which holds the customer data from the analysis editor. For further information, see Creating the database connection.
  3. Add indicators to provide simple statistics on data such as row , blank and duplicate counts. For further information, see Setting system indicators.
  4. Add standard patterns against which to match email addresses and phone numbers. For further information, see Setting patterns.
  5. Execute the analysis to show results in tables and charts. For further information, see Showing analysis results.
  6. Access a view of the analyzed data to see invalid records. For further information, see Browsing non-match data.
  7. Generate out-of-box Jobs from analysis results to remove duplicate values from the Email and Phone columns. For further information, see Removing duplicate values.
  8. Generate out-of-box Jobs from analysis results to remove values which do not respect the standard email format or phone number format from the Email and Phone columns respectively. For further information, see Removing non-matching values.