The Benford Law indicator (first-digit law) is based on examining the actual frequency of the digits 1 through 9 in numerical data. It is usually used as an indicator of accounting and expenses fraud in lists or tables.
Benford's law states that in lists and tables the digit 1 tends to occur as a leading digit about 30% of the time. Larger digits occur as the leading digits with lower frequency, for example the digit 2 about 17%, the digit 3 about 12% and so on. Valid, unaltered data will follow this expected frequency. A simple comparison of first-digit frequency distribution from the data you analyze with the expected distribution according to Benford's law ought to show up any anomalous results.
For example, let's assume an employee has committed fraud by creating and sending payments to a fictitious vendor. Since the amounts of these fictitious payments are made up rather than occurring naturally, the leading digit distribution of all fictitious and valid transactions (mixed together) will no longer follow Benford's law. Furthermore, assume many of these fraudulent payments have 2 as the leading digit, such as 29, 232 or 2,187. By using the Benford Law indicator to analyze such data, you should see the amounts that have the leading digit 2 occur more frequently than the usual occurrence pattern of 17%.
- make sure that the numerical data you analyze do not start with 0 as Benford's law expects the leading digit to range only from 1 to 9. This can be verified by using the number > Integer values pattern on the column you analyze.
- check the order of magnitude of the data either by selecting the min and max value indicators or by using the Order of Magnitude indicator you can import from Talend Exchange. This is because Benford's law tends to be most accurate when values are distributed across multiple orders of magnitude. For further information about importing indicators from Talend Exchange, see Importing user-defined indicators from Talend Exchange.
In the result chart of the Benford Law indicator, digits 1 through 9 are represented by bars and the height of the bar is the percentage of the first-digit frequency distribution of the analyzed data. The dots represent the expected first-digit frequency distribution according to Benford's law.
Below is an example of the results of an analysis after using the Benford Law indicator and the Order of Magnitude user-defined indicator on a total_sales column.
The first chart shows that the analyzed data varies over 5 orders of magnitude, that is there are 5 digits between the minimal value and maximal value of the numerical column.
The second chart shows that the actual distribution of the data (height of bars) does not follow the Benford's law (dot values). The differences are very big between the frequency distribution of the sales figures and the expected distribution according to Benford's law. For example, the usual occurrence pattern for sales figures that start with 1 is 30% and those figures in the analyzed data represent only 20%. Some fraud could be suspected here, sales figures may have been modified by someone or some data may be missing.
Below is another example of the result chart of a column analysis after using the Benford Law indicator.
The red bar labeled as invalid means that this percentage of the analyzed data does not start with a digit. And the 0 bar represents the percentage of data that starts with 0. Both cases are not expected when analyzing columns using the Benford Law indicator and this is why they are represented in red.
For further information about analyzing columns, see Creating a basic analysis on a database column.