This is my Third Year project which aims to predict air toxicity levels in Mumbai, India, focusing on pollutant concentrations at three locations: Chakala, Kurla, and the Airport. The goal is to find the best algorithm for pollutant data using machine learning models like Linear Regression, Random Forest, Decision Tree, LSTM, and XGBoost. Predictions from each model are compared using RMSE.
The project is organized into two main folders: data
and notebook
.
The data
folder contains two subfolders: raw_data
and processed_data
.
raw_data
: Contains the raw pollutant concentration data for the three locations in Mumbai.processed_data
: Contains the preprocessed data, ready for analysis and modeling.
The notebook
folder contains Python notebooks organized into four subfolders: EDA
, Prediction
, Forecast
, and a temporary/dump folder.
EDA
: Contains notebooks for exploratory data analysis.Prediction
: Contains notebooks for building and evaluating prediction models using various machine learning algorithms.Forecast
: Contains notebooks for forecasting pollutant concentrations.temp/dump
: A temporary folder for storing miscellaneous files and work-in-progress notebooks.
- sciencedirect.com: Modeling air quality prediction using a deep learning approach: Method optimization and evaluation.
- journalofbigdata.springeropen.com: Air-pollution prediction in smart city, deep learning approach.
- Clone the repository to your local machine.
- Install the required Python packages.
- Run the notebooks in the following order: Preprocessing, EDA, Prediction, and Forecast.