This R language file builds a multiple linear regression model intended to predict the busyness at the front desk for the hotel I have worked at.
I collected the data myself with a little help from a friend from the Marriott's MARSHA reservations database, as well as defining and tracking a metric I am calling "service events" (more on that below).
Do not use this data for anything without my explicit permission.
- Recording Day of Week: The day of the week the data was recorded.
- Lagged Day of Week: This is a holdover from my analysis of this data as a time series, where I could use an Excel sheet to change the lag on the predictor variables. It's currently set to not lag, since the intervals are too uneven for a time series analysis.
- Shift: The shift that the data is true for. AM shifts are from 7am-3pm, and PM shifts last from 3pm-11pm (very little occurs on 11pm-7am shifts, and I do not work those shifts).
- Arrivals: The number of guests who have reservations for a given day and shift.
- Departures: This is recorded for the AM shift, but typically not for the PM shift, since the hotel's checkout time is 3pm, making departures insignificant for the PM shift's busyness.
- Occupancy: This is the percentage of rooms sold at the start of the given shift, given as a number between 0 and 100.
- Guests per Occ: Guests per Occupancy Percentage. This is the total number of guests in the hotel divided by the occupancy percentage, which is a measure of room density.
- NCGR: Standing for Non-Corportate Group Rooms, this counts the number of rooms sold to guest who are traveling for an event that is non-professional in nature (e.g. the RootsTech conference or FanX). These rooms are sometimes a part of a contract with the event organizer or an affiliate. If they are from a contract, this comes from getting the arrivals and departures from the MARSHA database for the defined group. Otherwise, these numbers come from manual counting of people I have reason to believe are traveling for the event in question.
- Service Events: This counts the number of distinct encounters with people who need at least one task done by the desk. For instance, if a guest requested a towel, and another one needed to check in, that would be two service events. If one guest, at check-in, also mentioned she needed a towel, this would be 1 service event.
Having had some experience in what makes a hotel busy at the front desk, I recorded all potential features that I thought could contribute to the target in the data file. To narrow them down further, I started with a full model, and minimized the AIC to find the best fitting model, which is as follows (this process runs in the R script):
This model also resonnates with my work experience, as these things seem to contribute most to front desk traffic.
The plots produced by the R script demonstrate that this model satisfies all the assumptions for linear regression: errors centered at zero, normality of errors, and homoskedacity. The last two are confirmed by a Shapiro-Wilk test and a Breusch-Pagan test respectively. VIFs for included features had
The adjusted
This analysis determines that three key factors explain over 66% of the variance in busyness at the front desk: arrivals, non-corporate group rooms, and occupancy. This has important implications for staffing: these are the data to look at when considering how many people to schedule at the front desk for a given shift.