# Fraud Detection - 7.3

## Talend Open Studio User Guide

Version
7.3
Language
English
Product
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Module
Talend Studio
Content
Design and Development
Last publication date
2023-10-11
Available in...

Open Studio for Data Quality

The Benford Law indicator (first-digit law) is based on examining the actual frequency of the digits 1 through 9 in numerical data. It is usually used as an indicator of accounting and expenses fraud in lists or tables.

Benford's law states that in lists and tables the digit 1 tends to occur as a leading digit about 30% of the time. Larger digits occur as the leading digits with lower frequency, for example the digit 2 about 17%, the digit 3 about 12% and so on. Valid, unaltered data will follow this expected frequency. A simple comparison of first-digit frequency distribution from the data you analyze with the expected distribution according to Benford's law ought to show up any anomalous results.

For example, an employee has committed fraud by creating and sending payments to a fictitious vendor. Since the amounts of these fictitious payments are made up rather than occurring naturally, the leading digit distribution of all fictitious and valid transactions (mixed together) will no longer follow Benford's law. Furthermore, assume many of these fraudulent payments have 2 as the leading digit, such as 29, 232, or 2,187. By using the Benford Law indicator to analyze such data, you should see the amounts that have the leading digit 2 occur more frequently than the usual occurrence pattern of 17%.

When using the Benford Law indicator, it is advised to verify that the numerical data you analyze do not start with 0 as Benford's law expects the leading digit to range only from 1 to 9. This can be verified by using the number > Integer values pattern on the column you analyze.

In the result chart of the Benford Law indicator, digits 1 through 9 are represented by bars and the height of the bar is the percentage of the first-digit frequency distribution of the analyzed data. The dots represent the expected first-digit frequency distribution according to Benford's law.

Below is an example of the results of an analysis after using the Benford Law indicator and the Order of Magnitude user-defined indicator on a total_sales column. The chart shows that the actual distribution of the data (height of bars) does not follow the Benford's law (dot values). The differences are very big between the frequency distribution of the sales figures and the expected distribution according to Benford's law. For example, the usual occurrence pattern for sales figures that start with 1 is 30% and those figures in the analyzed data represent only 20%. Some fraud can be suspected here, sales figures may have been modified by someone or some data may be missing.

Below is another example of the result chart of a column analysis after using the Benford Law indicator. The red bar labeled as invalid means that this percentage of the analyzed data does not start with a digit. The 0 bar represents the percentage of data that starts with 0. Both cases are not expected when analyzing columns using the Benford Law indicator and this is why they are represented in red.

The following table shows the indicators that you can select in any database:

Data type Number Text Date Others
Analysis engine type Java SQL Java SQL Java SQL Java SQL
Benford Law        