/Business-Intelligence-Big-Data-Assignment

Analysis of Los Angeles Crimes 2010-19

Primary LanguageJupyter Notebook

LA CRIME ANALYSIS

Assignment for the course Business Intelligence and Big Data Analytics in the 4th year winter semester:

Use SQL Server (database and analysis services) or MySQL + Pentaho:

  1. A large data set will be found, which will will be cleaned and inserted into a data warehouse. Then a data cube and various metrics should be created. [40%]
  2. A visualization tool (Tableau or Power BI) will be used to create various instances of data visualization. [20%]
  3. The warehouse data will be used for some mining operations, such as categorization, correlation rules, clustering, etc. Use trading system methods and models or an open-source tool. Implement at least two models. [40%]

The analysis concerns the Los Angeles crimes from 2010-19. All the information about the dataset can be found here. In detail:

  • LA CRIMES PRESENTATION.pdf is the presentation of the analysis, application and conclusions. Quick way to examine the analysis.
  • Cleaning data.ipynb is the notebook file executing the ETL process (mostly extract and cleaning data).
  • Data Visualizations.ipynb are some examples of complex visualisations of the crime data with python. For all the visualizations check out the book, in section VISUALIZATIONS.
  • Clustering.ipynb describes three different clustering analyses, visualizations about them and the conclusions.
  • Machine Learning.ipynb is the analysis of training decision trees and their extensions in order to predict the crime type of unique incidents by using the rest information.
  • LA Crimes Book Greek.pdf is the whole book of the detailed analysis written in the greek language.

Execution

Firstly follow the instructions and run Cleaning data.ipynb and then execute the others.