Air Toxicity Prediction

This is my Third Year project which aims to predict air toxicity levels in Mumbai, India, focusing on pollutant concentrations at three locations: Chakala, Kurla, and the Airport. The goal is to find the best algorithm for pollutant data using machine learning models like Linear Regression, Random Forest, Decision Tree, LSTM, and XGBoost. Predictions from each model are compared using RMSE.

Project Structure

The project is organized into two main folders: data and notebook.

Data

The data folder contains two subfolders: raw_data and processed_data.

raw_data: Contains the raw pollutant concentration data for the three locations in Mumbai.
processed_data: Contains the preprocessed data, ready for analysis and modeling.

Notebook

The notebook folder contains Python notebooks organized into four subfolders: EDA, Prediction, Forecast, and a temporary/dump folder.

EDA: Contains notebooks for exploratory data analysis.
Prediction: Contains notebooks for building and evaluating prediction models using various machine learning algorithms.
Forecast: Contains notebooks for forecasting pollutant concentrations.
temp/dump: A temporary folder for storing miscellaneous files and work-in-progress notebooks.

References

sciencedirect.com: Modeling air quality prediction using a deep learning approach: Method optimization and evaluation.
journalofbigdata.springeropen.com: Air-pollution prediction in smart city, deep learning approach.

Additional Data Sources

Getting Started

Clone the repository to your local machine.
Install the required Python packages.
Run the notebooks in the following order: Preprocessing, EDA, Prediction, and Forecast.