Fraud Detection - Cloud - 8.0

Talend Studio User Guide

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development
Last publication date
2024-03-28
Available in...

Big Data Platform

Cloud API Services Platform

Cloud Big Data Platform

Cloud Data Fabric

Cloud Data Management Platform

Data Fabric

Data Management Platform

Data Services Platform

MDM Platform

Real-Time Big Data Platform

The Benford Law indicator (first-digit law) is based on examining the actual frequency of the digits 1 through 9 in numerical data. It is usually used as an indicator of accounting and expenses fraud in lists or tables.

Benford's law states that in lists and tables the digit 1 tends to occur as a leading digit about 30% of the time. Larger digits occur as the leading digits with lower frequency, for example the digit 2 about 17%, the digit 3 about 12% and so on. Valid, unaltered data will follow this expected frequency. A simple comparison of first-digit frequency distribution from the data you analyze with the expected distribution according to Benford's law ought to show up any anomalous results.

For example, an employee has committed fraud by creating and sending payments to a fictitious vendor. Since the amounts of these fictitious payments are made up rather than occurring naturally, the leading digit distribution of all fictitious and valid transactions (mixed together) will no longer follow Benford's law. Furthermore, assume many of these fraudulent payments have 2 as the leading digit, such as 29, 232, or 2,187. By using the Benford Law indicator to analyze such data, you should see the amounts that have the leading digit 2 occur more frequently than the usual occurrence pattern of 17%.

When using the Benford Law indicator, it is advised to verify that the numerical data you analyze does not start with 0 as Benford's law expects the leading digit to range only from 1 to 9. This can be verified by using the number > Integer values pattern on the column you analyze.

In the result chart of the Benford Law indicator, digits 1 through 9 are represented by bars and the height of the bar is the percentage of the first-digit frequency distribution of the analyzed data. The dots represent the expected first-digit frequency distribution according to Benford's law.

Below is an example of the results of an analysis after using the Benford Law indicator on a column.

Example of analysis results against the Benford Law indicator.

The chart shows that the actual distribution of the data (height of bars) does not follow the Benford's law (dot values). The differences are very big between the frequency distribution of the sales figures and the expected distribution according to Benford's law. For example, the usual occurrence pattern for sales figures that start with 1 is 30% and those figures in the analyzed data represent only 25%. Some fraud can be suspected here, sales figures may have been modified by someone or some data may be missing.

The orange bar labeled as invalid means that this percentage of the analyzed data does not start with a digit. This case is not expected when analyzing columns using the Benford Law indicator and this is why they are represented in orange.

The following table shows the indicators that you can select in any database:

Indicator Supported data types with the Java analysis engine Supported data types with the SQL analysis engine
Benford Law
  • Number
  • Text
  • Number
  • Text