/Analysing-Benford-s-Law-for-anomaly-fraud-detection-

Comparing Fibonacci numbers and length of rivers of the world with Benford's law using Matplotlib for data visualization and Pandas for data cleaning

Primary LanguageJupyter Notebook

Analysing-Benford-s-Law-for-anomaly-fraud-detection-

Comparing Fibonacci numbers and length of rivers of the world with Benford's law using Matplotlib for data visualization and Pandas for data cleaning

Introduction

Benford’s Law (Sarkar, 2018), also known as the Law of First Digits or the Phenomenon of Significant Digits, is the finding that the first digits (or numerals to be exact) of the numbers found in series of records of the most varied sources do not display a uniform distribution, but rather are arranged in such a way that the digit “1” is the most frequent, followed by “2”, “3”, and so in a successively decreasing manner down to “9”. In this report we’d we evaluating benford’s law in three different case scenarios and further evaluating our findings to estimate what are the factors that affect the accuracy of the law and how we could use it with our data. Using following techniques and datasets.

TECHNIQUES:

• Excel Sheets • Python Program

DATASETS:

• Fibonacci Series • Length of Rivers

This analysis aims to implement Benford’s law in a two set of naturally occurring datasets and make assumptions in accordance with the study of Steven J. Miller (Miller, 2015) As mentioned above we’d be working on these data and techniques to evaluate our answers through three different cases, in order to excerpt our findings into an entrenched conclusion.

What came in notice?

As briefly explained above, Benford's Law preserves that the first digit will be the leading digit in a unaffected data set of numbers for 30.1% of the time; the second digit will be the leading digit 17.6% of the time; and each subsequent numeral, 3 through 9, will be the leading digit with a dropping frequency. This probable occurrence of principal digits can be illustrated in an increasing order. Based on our results as mentioned above the report comprises of three major findings. From the theory presented by Benford, the study were successfully able to validate the naturally occurring numbers. In case 2 we were able to observe how increasing the number of data improved our output.

What are the practical implications of the result?

There have been abundant examples of researchers and corporations tinkering with data; if concealed, the consequences could be severe, ranging from companies not paying their fair share of taxes, to hazardous medical treatments becoming official, to unscrupulous researchers being funded at the expense of their honest peers, to electoral fraud and the effective disenfranchisement of voters. With a huge enough data set, the laws of probability and statistics state that certain patterns should emerge. Some of these consequences are well known, and thus are easily incorporated by individuals modifying data.