Identifying anomalies in data - 7.3

Talend Big Data Platform Getting Started Guide

Version
7.3
Language
English
Operating system
Big Data Platform
Product
Talend Big Data Platform
Module
Talend Administration Center
Talend DQ Portal
Talend Installer
Talend Runtime
Talend Studio
Content
Data Quality and Preparation > Cleansing data
Data Quality and Preparation > Profiling data
Design and Development
Installation and Upgrade
Last publication date
2023-07-24

The use case explains how to use the Profiling perspective of the studio to analyze customer email addresses and phone numbers. It uses out-of-box indicators and patterns on the columns and shows the matching and non-matching address data.

Profiling Jobs are then generated on the analysis results to clean customer data and monitor its evolution.

You can then use the Data Explorer perspective to browse the non-matching data.

The sequence of profiling and cleansing customer data involves the following steps:

Procedure

  1. Create a column analysis on customer email addresses and phone numbers.
  2. Connect to the database which holds the customer data from the analysis editor.
  3. Add indicators to provide simple statistics on data such as row , blank and duplicate counts.
  4. Add standard patterns against which to match email addresses and phone numbers.
  5. Execute the analysis to show results in tables and charts.
  6. Access a view of the analyzed data to see invalid records.
  7. Generate out-of-box Jobs from analysis results to remove duplicate values from the Email and Phone columns.
  8. Generate out-of-box Jobs from analysis results to remove values which do not respect the standard email format or phone number format from the Email and Phone columns respectively.