NYPD Data Analysis Project

This project performs general data analysis on a given dataset. The database records all of the recorded NYPD's stop incidences and the details pertaining to each stop. Some of the items recorded include the description of the suspected crime, whether the officer was wearing a uniform, the suspect's demeanor, and the location of the stop and frisks. These incidents are recorded through an app that are then uploaded to the database. Data source: NYPD's 2019 Stop, Question, and Frisk Database

Content

Main files in this project:

DataAnalysisProject(RATM).ipynb: This file loads the dataset and performs all the analysis mentioned below.
NYPD2019.csv: We do not own the dataset. The full .xls file is located at this url: https://www1.nyc.gov/site/nypd/stats/reports-analysis/stopfrisk.page. However, in order to assist setup of the .ipynb project, a preprocessed version has been added to the repository.
README.md: This is a brief overview of the project repository. So references to files, techniques, concepts, and sources.

Analysis techniques

The following techniques have been utilized to perform the analysis:

Data Pre-processing
Logistic Regression
Demographic and Predictive Parity
Matplotlib

Outline of the DataAnalysisProject(RATM).ipynb file:

Section 1: Onboarding process
- Data preprocessing
  - Removal of features
  - Feature engineering
  - Dropping extraneous columns
  - Type conversion
- Reading in the NYPD dataset
Section 2: Modeling process
- Training and predicting with the model
Section 3: Fairness definitions
- Create a split_on_feature() function
- Predictive rate parity
- Demographic parity
- Errors comparison and inferences
Section 4: Conclusion
- Other possible definitions
- Things to note
- Citations

Citations

For our project, we used a handful of reliable sources, as cited below:

Citations

MLXtend:

Sebastian Raschka, MLxtend: Providing machine learning and data science utilities and extensions to Python’s scientific computing stack, The Journal of Open Source Software, 3, 24, apr, 2018,The Open Journal, doi:10.21105/joss.00638, https://joss.theoj.org/papers/by/Sebastian%20Andersen, Accessed 5 Dec 2020

SKlearn:

scikit-learn, Scikit-learn: Machine Learning in Python, Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V. and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P. and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E., Journal of Machine Learning Research, 12, 2825--2830, 2011, https://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html, Accessed 4 Dec 2020

A Strategy to Split the dataset:

Craig Shallahamer, Random Shuffle Strategy To Split Your Full Dataset, jun, 2020, OraPub} doi:10.21105/joss.00638}, https://blog.orapub.com/20200630/random-shuffle-strategy-to-split-your-full-dataset.html, Accessed 4 Dec 2020

NYPD dataset:

NYPD, “Stop, Question and Frisk Data.” Publications, Reports - NYPD, www1.nyc.gov/site/nypd/stats/reports-analysis/stopfrisk.page, Accessed 2 Dec 2020

TLDR dataset:

NYPD, “Stop-and-Frisk Data.” New York Civil Liberties Union, 11 Mar. 2020, www.nyclu.org/en/stop-and-frisk-data, Accessed 4 Dec 2020

How to handle missing values:

Kumar, Satyam. “7 Ways to Handle Missing Values in Machine Learning.” Medium, Towards Data Science, 2 Aug. 2020, towardsdatascience.com/7-ways-to-handle-missing-values-in-machine-learning-1a6326adf79e, Accessed 7 Dec 2020

Feature Engineering:

“Representation: Feature Engineering | Machine Learning Crash Course.” Google, Google, developers.google.com/machine-learning/crash-course/representation/feature-engineering, Accessed 7 Dec 2020

Create Fit Model:

Fairness Definition:

Zhong, Ziyuan. “A Tutorial on Fairness in Machine Learning.” Medium, Towards Data Science, 19 June 2020, towardsdatascience.com/a-tutorial-on-fairness-in-machine-learning-3ff8ba1040cb, Accessed 8 Dec 2020

NYPD Corruption and Misconduct data:

“New York City Police Department Corruption and Misconduct.” Wikipedia, Wikimedia Foundation, 1 Dec. 2020, en.wikipedia.org/wiki/New_York_City_Police_Department_corruption_and_misconduct, Accessed 8 Dec 2020

Affects on minorities:

Chapman, Ben, and Katie Honan. “NYPD's Stop-and-Frisk Practice Still Affects Minorities in New York City.” The Wall Street Journal, Dow Jones & Company, 18 Nov. 2019, www.wsj.com/articles/nypds-stop-and-frisk-practice-still-affects-minorities-in-new-york-city-11574118605, Accessed 8 Dec 2020

Police self prophecy:

“From the President: Predictive Policing: The Modernization of Historical Human Injustice.” NACDL, www.nacdl.org/Article/September-October2017-FromthePresidentPredictivePo, Accessed 9 Dec 2020

NY census:

“U.S. Census Bureau QuickFacts: New York.” Census Bureau QuickFacts, www.census.gov/quickfacts/NY, Accessed 9 Dec 2020

Authors

Chidvi Doddi - DoddiC
Diana Bui - dianadbui
Susanna Morin - codeswitch

DoddiC/NYPD-Data-Analysis-Project

NYPD Data Analysis Project

Content

Analysis techniques

Outline of the DataAnalysisProject(RATM).ipynb file:

Citations

Citations

Authors