/Data-Mining-R-Project

Police Killings Data Mining Project contains the data behind the story Where Police Have Killed Americans In 2015.

Primary LanguageRMIT LicenseMIT

Data-Mining-R-Project a

Aim of the Project

The purpose of this data mining project is to better understand the victims killed by police in the year 2015 in the United States and to understand how they died and for what reason they were killed. As we know from certain sources The United States has much more police murder rates than similar Western murder countries. Keeping this fact in mind, we will try to clarify this subject according to the information in the data. These higher rates may have a particular focus on race and ethnicity, and how this variable has changed the possibility of being a victim of police murders. We will use classification and clustering methods for this purpose. I Analyzed the data provided with the feature descriptions provided using R language.

This project contains following steps;

1. Data Preprocessing
-> Preprocess the data (handle missing values, noise etc.). Plot data by using proper data visualizations. (histograms, scatter plot, mosaic plot etc)

2. Feature Selection and Dimentionality of Reduction
-> Choose features that are important for the analysis.

3. Applying Clustering Algorithms
-> Applying apropriate clustering algorithms such as K-means and Hierarchial Clustering(Agglomeration) methods to generate clusters. Choose best cluster number. At the end of the process each record will have a cluster label. Analyze properties of each cluster.

4. Applying Classification Algorithms
->Apply classification algorithms such as Decision Trees and Artificial Neural Network(ANN) to generate a model to classify the cluster label.

You can get the dataset from following link;
https://www.kaggle.com/fivethirtyeight/fivethirtyeight-police-killings-dataset

Some Examples From Mining Process;

Data Analysis (Race Attribute)
655161fd06c272f215adce273aeefc9f

Missing Data Profile
8db70d0a17501797c7db4bfe724ba264

Handling with Missing Data
c94ae3907342b1952fdd7955549409d5

d3187aed46db9da6e2c523b6633828f2

Rounding Attributes
09576ec1fd79f81887760ba46fbcb339

->Correlation Matrix(Initial)
3e24e40577c055ae88083322df04602a

Removing Unnecessary Attributes
46f1c37256f82f5b1bece97288589cb0

->Correlation Matrix(After Cleaning :)
süper

Converting Categorical Attributes to Numeric
dd22e171ba32332a792b56999c38152e

Feature Selection / Dimentionality Reduction

-> Random Forest
b8c681db1465b0bfc9a84477ccb13871

->Recursive Forest Elimination (RFE)
rec

->Principal Component Analysis(PCA)
0484dceb84109d19dd51d9d9bf8ce4e6

->Boruta Algorithm
2019-06-06 13_13_32-Window

Dimensionality Reduction
e5c789a0819ab766ccce40504dfc4080

Elbow Method to find Number of Clusters
16c5ce24b717e22036e1cc5822fee922

K-Means Clustering
kmeans

Hierarchial Clustering(Agglomeration Method)
9f9ed373f2d03377b15fef527a53ea42

Comparison K-means and Hierarchial Clustering
-> K-Means
kmen

-> Hierarchial Clustering
hie

Artificial Neural Network(ANN)
0fde64488e6204e311aa831ede25f463

Decision Trees(DT)
2019-05-30 02_36_26-tree pdf - Foxit Reader

CONCLUSION
During this project, I think that we have gained a lot of things in this field because we did a lot of research on behalf of data mining. Although we can not get exact results we are still doing a lot of research, we have gained useful knowledge and experience. Especially, the part of preprocessing the data took most of my time. After cleaning the data properly, you can actually use only the algorithms correctly. Another result I obtained from this data is that the part of unarmed people covers covers a certain part of the data. I have learned a lot of interesting information thanks to this homework, it was a really good experiment for me.

REFERENCES
[1] https://www.theguardian.com/us-news/ng-interactive/2015/jun/01/the-counted-police-killings-us-database
The Counted is a project by the Guardian working to count the number of people killed by police and other law enforcement agencies in the United States throughout 2015 and 2016, to monitor their demographics and to tell the stories of how they died.

[2] https://www.kaggle.com/fivethirtyeight/fivethirtyeight-police-killings-dataset
This web page contains the dataset behind the story Where Police Have Killed Americans In 2015.

[3] https://www.kaggle.com/stevechadwick/police-killings-analysis/
This web page contains the useful informations about dataset.

[4] https://fivethirtyeight.com/features/where-police-have-killed-americans-in-2015/
Very Nice Article Written By Ben Casselman, JUN. 3, 2015, AT 11:58 AM Where Police Have Killed Americans In 2015, Ben Casselman is a senior editor and the chief economics writer for FiveThirtyEight.

[5] https://www.analyticsindiamag.com/data-preprocessing-with-r-hands-on-tutorial/
Basics of Data Preprocessing in R, The Tutorial for Beginners

[6] https://cran.r-project.org/web/packages/
The Official R Packages Repo, I have read many of them (especially randomForest)

[7] https://www.analyticsvidhya.com/blog/2017/09/creating-visualizing-neural-network-in-r/
Artificial Neural Network Example

License

Data Mining R Project is licensed under the MIT license. See LICENSE for more information.

Project Status

You can download the latest release from this repository.

Disclaimer

This project was prepared and shared for educational purposes only. You can use or edit any file as you wish :)

About

Süha TANRIVERDİ Çankaya University, Computer Engineering

Garbage in, garbage out

gigo