The Benford Law indicator (first-digit law) is
based on examining the actual frequency of the digits `1` through
`9` in numerical data. It is usually used as an indicator of
accounting and expenses fraud in lists or tables.

Benford's law states that in lists and tables the digit `1`
tends to occur as a leading digit about 30% of the time. Larger digits occur as the
leading digits with lower frequency, for example the digit `2`
about 17%, the digit `3` about 12% and so on. Valid, unaltered
data will follow this expected frequency. A simple comparison of first-digit
frequency distribution from the data you analyze with the expected distribution
according to Benford's law ought to show up any anomalous results.

For example, let's assume an employee has committed fraud by creating and sending
payments to a fictitious vendor. Since the amounts of these fictitious payments are
made up rather than occurring naturally, the leading digit distribution of all
fictitious and valid transactions (mixed together) will no longer follow Benford's
law. Furthermore, assume many of these fraudulent payments have
`2` as the leading digit, such as 29, 232 or 2,187. By using
the Benford Law indicator to analyze such data, you should see the amounts that have
the leading digit `2` occur more frequently than the usual
occurrence pattern of 17%.

- make sure that the numerical data you analyze do not start with
`0`as Benford's law expects the leading digit to range only from`1`to`9`. This can be verified by using the number > Integer values pattern on the column you analyze. - check the order of magnitude of the data either by selecting
the min and max value indicators or by using the Order of Magnitude indicator you can import from
Talend
Exchange. This is because Benford's law tends to be most
accurate when values are distributed across multiple orders of magnitude.
For more information about importing indicators from Talend Exchange, see Importing user-defined indicators from Talend Exchange.

In the result chart of the Benford Law indicator,
digits `1` through `9` are represented by bars
and the height of the bar is the percentage of the first-digit frequency
distribution of the analyzed data. The dots represent the expected first-digit
frequency distribution according to Benford's law.

Below is an example of the results of an analysis after using the Benford Law indicator and the Order
of Magnitude user-defined indicator on a
`total_sales` column.

The first chart shows that the analyzed data varies over `5`
orders of magnitude, that is there are `5` digits between the
minimal value and maximal value of the numerical column.

The second chart shows that the actual distribution of the data (height of bars)
does not follow the Benford's law (dot values). The differences are very big between
the frequency distribution of the sales figures and the expected distribution
according to Benford's law. For example, the usual occurrence pattern for sales
figures that start with `1` is 30% and those figures in the
analyzed data represent only 20%. Some fraud could be suspected here, sales figures
may have been modified by someone or some data may be missing.

Below is another example of the result chart of a column analysis after using the Benford Law indicator.

The red bar labeled as invalid means that this percentage of the analyzed data
does not start with a digit. And the `0` bar represents the
percentage of data that starts with 0. Both cases are not expected when analyzing
columns using the Benford Law indicator and this is
why they are represented in red.

The following table shows the indicators that you can select in any database:

Data type | Number | Text | Date | Others | ||||
---|---|---|---|---|---|---|---|---|

Analysis engine type | Java | SQL | Java | SQL | Java | SQL | Java | SQL |

Benford Law |