With 234 constituencies in the Indian state of Tamil Nadu, the 2021 Legislative Assembly election poll was Tamil Nadu's first assembly election after the demise of the two most prominent Chief Ministers in the state's modern history, J. Jayalalithaa and M. Karunanidhi. In order to improve the economy after the COVID-19 pandemic, both the AIADMK and DMK promised jobs in their manifestos. Industries, especially MSMEs, have been hit hard by the slowdown in the economy. These factors helped and made us take up this topic and analyze it. The problem uses “TCPD Indian Elections Data v2.0" of Tamil Nadu General Legislative Election 2021 and analyses the result with the list of political parties participating, performance of political parties,performance of women candidates, and other related data,percentage of votes cast, also a logistic regression model with 98.72 % accuracy and a bring out a visual idea. With the loss of 2 major leaders from both the major parties, Tamil Nadu faced its most anticipated 16th legislative election on 6-April-2021 at a single phase. The results of this election were released on 2 May-2021 by Election Commission Of India(ECI) .
Tamil Nadu Assembly Election 2021: However, to my surprise, this dataset didn't quite follow Benford's Law's expectations.while Benford's Law can raise a flag for further investigation, it's crucial to remember that drawing conclusions solely based on this law can be misleading.
Chi-square test: Chi-square statistic: 91.76589555999786 P-value: 2.0360255093931798e-16
Z-test: Z-scores for each leading digit: Leading Digit 1: 3.912656542225844 Leading Digit 2: -4.775474494912755 Leading Digit 3: -4.7594418550282 Leading Digit 4: -1.8414947626013112 Leading Digit 5: -1.6145961318133277 Leading Digit 6: 1.2875237577789311 Leading Digit 7: 2.3106555918591574 Leading Digit 8: 4.127203821984585 Leading Digit 9: 3.8646675727844104
Interpretation: The chi-square statistic is a measure of how much the observed leading digit frequencies differ from the expected frequencies based on Benford's Law. The very low p-value indicates that the differences are statistically significant, suggesting that the observed leading digit distribution is significantly different from what would be expected according to Benford's Law.
The z-scores indicate how many standard deviations the observed proportions of each leading digit are from the expected proportions based on Benford's Law. Interpretation:
- A positive z-score (e.g., for leading digits 1, 6, 7, 8, and 9) indicates that the observed proportion is higher than expected based on Benford's Law.
- A negative z-score (e.g., for leading digits 2, 3, 4, and 5) indicates that the observed proportion is lower than expected based on Benford's Law.
The results of our analysis suggest that the vote counts in the election dataset do not follow Benford’s law. This could indicate some anomalies or irregularities in the data collection or reporting process. However, it is important to note that Benford’s law is not a definitive test for fraud detection, and there could be other factors that explain the deviation from the expected distribution. Therefore, we recommend further investigation and verification of the data sources and methods before drawing any conclusions.