Data-Mining-R-Project

Aim of the Project

The purpose of this data mining project is to better understand the victims killed by police in the year 2015 in the United States and to understand how they died and for what reason they were killed. As we know from certain sources The United States has much more police murder rates than similar Western murder countries. Keeping this fact in mind, we will try to clarify this subject according to the information in the data. These higher rates may have a particular focus on race and ethnicity, and how this variable has changed the possibility of being a victim of police murders. We will use classification and clustering methods for this purpose. I Analyzed the data provided with the feature descriptions provided using R language.

This project contains following steps;

1. Data Preprocessing
-> Preprocess the data (handle missing values, noise etc.). Plot data by using proper data visualizations. (histograms, scatter plot, mosaic plot etc)

2. Feature Selection and Dimentionality of Reduction
-> Choose features that are important for the analysis.

3. Applying Clustering Algorithms
-> Applying apropriate clustering algorithms such as K-means and Hierarchial Clustering(Agglomeration) methods to generate clusters. Choose best cluster number. At the end of the process each record will have a cluster label. Analyze properties of each cluster.

4. Applying Classification Algorithms
->Apply classification algorithms such as Decision Trees and Artificial Neural Network(ANN) to generate a model to classify the cluster label.

You can get the dataset from following link;
https://www.kaggle.com/fivethirtyeight/fivethirtyeight-police-killings-dataset

Some Examples From Mining Process;

Data Analysis (Race Attribute)

Missing Data Profile

Handling with Missing Data

Rounding Attributes

->Correlation Matrix(Initial)

Removing Unnecessary Attributes

->Correlation Matrix(After Cleaning :)

Converting Categorical Attributes to Numeric

Feature Selection / Dimentionality Reduction

-> Random Forest

->Recursive Forest Elimination (RFE)

->Principal Component Analysis(PCA)

->Boruta Algorithm

Dimensionality Reduction

Elbow Method to find Number of Clusters

K-Means Clustering

Hierarchial Clustering(Agglomeration Method)

Comparison K-means and Hierarchial Clustering
-> K-Means

-> Hierarchial Clustering

Artificial Neural Network(ANN)

Decision Trees(DT)

CONCLUSION
During this project, I think that we have gained a lot of things in this field because we did a lot of research on behalf of data mining. Although we can not get exact results we are still doing a lot of research, we have gained useful knowledge and experience. Especially, the part of preprocessing the data took most of my time. After cleaning the data properly, you can actually use only the algorithms correctly. Another result I obtained from this data is that the part of unarmed people covers covers a certain part of the data. I have learned a lot of interesting information thanks to this homework, it was a really good experiment for me.

REFERENCES
[1] https://www.theguardian.com/us-news/ng-interactive/2015/jun/01/the-counted-police-killings-us-database
The Counted is a project by the Guardian working to count the number of people killed by police and other law enforcement agencies in the United States throughout 2015 and 2016, to monitor their demographics and to tell the stories of how they died.

[2] https://www.kaggle.com/fivethirtyeight/fivethirtyeight-police-killings-dataset
This web page contains the dataset behind the story Where Police Have Killed Americans In 2015.

[3] https://www.kaggle.com/stevechadwick/police-killings-analysis/
This web page contains the useful informations about dataset.

[4] https://fivethirtyeight.com/features/where-police-have-killed-americans-in-2015/
Very Nice Article Written By Ben Casselman, JUN. 3, 2015, AT 11:58 AM Where Police Have Killed Americans In 2015, Ben Casselman is a senior editor and the chief economics writer for FiveThirtyEight.

[5] https://www.analyticsindiamag.com/data-preprocessing-with-r-hands-on-tutorial/
Basics of Data Preprocessing in R, The Tutorial for Beginners

[6] https://cran.r-project.org/web/packages/
The Official R Packages Repo, I have read many of them (especially randomForest)

[7] https://www.analyticsvidhya.com/blog/2017/09/creating-visualizing-neural-network-in-r/
Artificial Neural Network Example

License

Data Mining R Project is licensed under the MIT license. See LICENSE for more information.

Project Status

You can download the latest release from this repository.

Disclaimer

This project was prepared and shared for educational purposes only. You can use or edit any file as you wish :)

About

Süha TANRIVERDİ Çankaya University, Computer Engineering