Barcelona Vehicle Data Analysis: Pandas vs. Polar with Time Benchmarks
This repository presents a detailed analysis of vehicle data from the city of Barcelona, with a special focus on comparing the performance of the Pandas and Polar libraries. Beyond just showcasing their capabilities, this project benchmarks the execution time for various operations, highlighting the performance differences and efficiencies of each library.
Data Overview
The dataset used for this analysis includes:
Any
: Year.Codi_Districte
: District Code.Nom_Districte
: District Name.Codi_Barri
: Neighbourhood Code.Nom_Barri
: Neighbourhood Name.Seccio_Censal
: Census.Tipus_Vehicles
: Type of Vehicle.Antiguitat
: Years of operation.Nombre
: Quantity.
Analysis Highlights
- Data Exploration: Initial overview and understanding of the dataset attributes.
- Data Processing using Pandas and Polar: Cleaned, transformed, and prepared the dataset for analysis using both libraries.
- Performance Benchmarking: Measured and documented the execution time for key operations (like data loading, filtering, aggregations) using both Pandas and Polar. This quantitative comparison provides clear insights into the efficiency and speed of each library.
Setup and Execution
Ensure you have Jupyter and the necessary libraries installed:
pip install numpy
pandas
polar
jupyterlab
Then, navigate to the repository's directory and launch:
JupyterLab
This will open Jupyter Lab, enabling interactive exploration of the notebook.
Contributions
Enhancements and insights are welcome! Whether it's refining the benchmarks, enhancing visualizations, or adding more depth to the analysis, please fork this repository and submit pull requests for improvements.