Hey everyone! During the Spring '22 semester, I enrolled in the Project Seminar course for the B.A./M.A. program. Using data from the NYC Department of Sanitation, I analyzed waste tonnage collected throughout the five boroughs. The data has been imported into R using an API and the RSocrata package.
- The challenge that I took on was to create a univariate time series model to anaylze the total waste collected for each of the five boroughs
- All of the models used seasonal ARIMA models to analyze each time series
- A common pattern seen within the models was the use of the differenced series, adding non-seasonal MA() arguments and seasonal AR() arguments
- A preliminary multiple linear regression model was used, with the total tonnage collected in NYC per month, being regressed onto external variables
- This model returned an adjusted r-squared = 0.41
- A dynamic regression model was introduced
- Where we are allowing the errors from a regression model to contain autocorrelation
- These models will have two error terms - the error from the regression model, which we denote by 𝜂_𝑡 and the error from the ARIMA model, which we denote by 𝜀_𝑡
- Only the ARIMA model errors are assumed to be white noise
- Average temperature, average percipitation, average number of cooling degree days
- NYC unemployment rate
- NYS-NJ Consumer Price Index
- Forecasting: Principles and Practice, third edition
- Advanced R
- R for Data Science
- Numerous research papers taking on a similar task within other countries