This project provides tools and methods for analyzing crime data across different regions in the UK. It consolidates crime records from multiple CSV files, calculates summary statistics, and offers breakdowns by crime type and jurisdiction.
- Overview
- Data Requirements
- Project Structure
- Installation and Setup
- Usage
- Known Issues
- Contributing
- License
The main objectives of this project are:
- To combine multiple CSV files of crime data into a single DataFrame.
- To calculate key statistics, such as the earliest and latest data records and total crime counts.
- To generate breakdowns of crimes by type and jurisdiction.
To run this project, you will need to download crime data files from Data.Police.UK. Follow these steps:
- Go to the Data.Police.UK website.
- Select the specific region(s) and time period of interest.
- Download the files in CSV format.
- Save all downloaded files in a folder named
raw_data
in the root directory of this project.
The main folders in this project are:
raw_data
: Contains raw CSV files of downloaded crime data.output_data
: Stores combined and processed data outputs, including.csv
and.pkl
files.
-
Clone the Repository
git clone https://github.com/your-username/crime-data-analysis.git cd crime-data-analysis
-
Install Required Packages Make sure you have Python installed. You can install the required packages with:
pip install -r requirements.txt
The main libraries used include pandas for data handling and tqdm for progress tracking. -
Set Up Data
- Download the crime data as described above.
- Place all files in the raw_data folder.
-
Run the Jupyter Notebook Start Jupyter Notebook in the project directory:
jupyter notebook
Open the crime_data_analysis.ipynb notebook and execute each cell to load data, perform analysis, and save results. -
Run Data Loading and Processing The notebook will:
- Combine all CSV files in the raw_data folder.
- Calculate and display statistics on the data range, total crime count, and breakdowns by crime type and jurisdiction.
- Save combined data to output_data as both .csv and .pkl files.
Refer to the changelog on DATA.POLICE.UK for any known issues with the dataset. Understanding these issues is important to assess the reliability of the data, as certain limitations or errors could impact analysis results.
Contributions are welcome! To contribute:
- Fork the repository.
- Create a new branch (git checkout -b feature-branch).
- Commit your changes (git commit -m 'Add new feature').
- Push to the branch (git push origin feature-branch).
- Open a Pull Request.