WaveCatBoost Time Series Model

WaveCatBoost for Probabilistic Forecasting of Regional Air Quality Data

Accurate and reliable air quality forecasting is essential for protecting public health, sustainable development, pollution control, and enhanced urban planning. This letter presents a novel WaveCatBoost architecture designed to forecast the real-time concentrations of air pollutants by combining the maximal overlapping discrete wavelet transform (MODWT) with the CatBoost model. This hybrid approach efficiently transforms time series into high-frequency and low-frequency components, thereby extracting signal from noise and improving prediction accuracy and robustness. Evaluation of two distinct regional datasets, from the Central Air Pollution Control Board (CPCB) sensor network and a low-cost air quality sensor system (LAQS), underscores the superior performance of our proposed methodology in real-time forecasting compared to the state-of-the-art statistical and deep learning architectures. Moreover, we employ a conformal prediction strategy to provide probabilistic bands with our forecasts.

Subjects: Machine Learning (cs.LG)

Cite as: arXiv:2404.05482 [cs.LG]

(or arXiv:2404.05482v1 [cs.LG] for this version)