# Fraud Detection

The Benford Law indicator (first-digit law) is
based on examining the actual frequency of the digits `1` through
`9` in numerical data. It is usually used as an indicator of
accounting and expenses fraud in lists or tables.

Benford's law states that in lists and tables the digit `1`
tends to occur as a leading digit about 30% of the time. Larger digits occur as the
leading digits with lower frequency, for example the digit `2`
about 17%, the digit `3` about 12% and so on. Valid, unaltered
data will follow this expected frequency. A simple comparison of first-digit
frequency distribution from the data you analyze with the expected distribution
according to Benford's law ought to show up any anomalous results.

For example, an employee has committed fraud by creating and sending payments to a
fictitious vendor. Since the amounts of these fictitious payments are made up rather than
occurring naturally, the leading digit distribution of all fictitious and valid
transactions (mixed together) will no longer follow Benford's law. Furthermore, assume many
of these fraudulent payments have `2` as the leading digit, such as
29, 232, or 2,187. By using the Benford Law indicator to analyze such data, you should see
the amounts that have the leading digit `2` occur more frequently than
the usual occurrence pattern of 17%.

When using the Benford Law indicator, it is advised to verify that
the numerical data you analyze do not start with `0` as Benford's law
expects the leading digit to range only from `1` to
`9`. This can be verified by using the number >
Integer values pattern on the column you analyze.

In the result chart of the Benford Law indicator,
digits `1` through `9` are represented by bars
and the height of the bar is the percentage of the first-digit frequency
distribution of the analyzed data. The dots represent the expected first-digit
frequency distribution according to Benford's law.

Below is an example of the results of an analysis after using the Benford Law indicator and the Order
of Magnitude user-defined indicator on a
`total_sales` column.

The chart shows that the actual distribution of the data (height of bars) does not follow
the Benford's law (dot values). The differences are very big between the frequency
distribution of the sales figures and the expected distribution according to Benford's law.
For example, the usual occurrence pattern for sales figures that start with
`1` is 30% and those figures in the analyzed data represent only
20%. Some fraud can be suspected here, sales figures may have been modified by someone or
some data may be missing.

Below is another example of the result chart of a column analysis after using the Benford Law indicator.

The red bar labeled as invalid means that this percentage of the analyzed data does not
start with a digit. The `0` bar represents the percentage of data that
starts with 0. Both cases are not expected when analyzing columns using the
Benford Law indicator and this is why they are represented in
red.

The following table shows the indicators that you can select in any database:

Data type | Number | Text | Date | Others | ||||
---|---|---|---|---|---|---|---|---|

Analysis engine type | Java | SQL | Java | SQL | Java | SQL | Java | SQL |

Benford Law |

## Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!