Remote Work Location Recommender

Executive Summary
Data Dictionaries
Data Acquisition & Cleaning
EDA
Building Recommendation Functions
Conclusion and Next Steps

Executive Summary

Problem Statement:

I work at ABC Marketing, Inc. and we just signed Airbnb as our latest client.

They’ve asked us to build a web application that is easy to use and will drive repeat traffic to their website.

Given that Forbes has projected 25% of all professional jobs in North America will be remote by the end of 2022, our goal is to create a ready-for-launch remote work location recommendation system that drives users directly to Airbnb’s page.

Data Dictionaries

Due to the usage of several datasets, all data dictionaries are in their respective directories and linked separately here:

Data dictionary for Airbnb Dataset
Data dictionary for Chain Restaurant Dataset
Data dictionary for Cost of Living Dataset
Data dictionary for Weather Dataset
Data dictionary for Walkability Dataset

Data Acquisition and Cleaning

Notebook can be viewed here.

In order to build a system that could recommend a location based on user input, we needed to compile data that coincided with the questions we believe we'd be asking of the user. Below are brief summaries of the datasets used, as well as the sources.

Datasets:

Airbnb Data - 279k observations

13 features for each Airbnb listing including neighborhood, price per night, and minimum nights required for booking, all downloaded from Inside Airbnb.

Chain Restaurant Data - 155k observations

13 features relating to each restaurant listed, including name, type of cuisine, urban area location, and whether the restaurant qualifies as a chain. Data provided courtesy of Friendly Cities Lab.

Cost of Living Data - 33 observations

16 features relating to various socioeconomic and political characteristics of each location. Data was pulled from BestPlaces.net for each location and aggregated.

Weather Data - 33 observations

Two separate datasets detailing monthly temperature and rainfall trends for each location, from The National Centers for Environmental Information. Data was aggregated into one larger dataset for subsequent usage.

Walkability Data - 40k observations

46 features for each census block including population demographics, auto and transit scores, and walkability from the US Environmental Protection Agency.

Exploratory Data Analysis

Notebook can be viewed here.

The purpose of this notebook is to explore the various aspects in each of the datasets, helping us to hone in on the useful features that can be used in our functions and identify any patterns or inconsistencies that might cause problems when making recommendations. Following are examples of the explorations conducted:

Key Findings

Which locations have the most expensive Airbnb listings?

Which locations have the least expensive Airbnb listings?

Which locations have the highest ratios of chain restaurants?

Which locations have the lowest ratios of chain restaurants?

What are the most common chains across all locations?

Are there more chains in locations where the grocery cost index is higher?

Which locations have the highest costs of living?

Which locations have the lowest costs of living?

Do we see a relationship between cost of living and political lean?

Building Recommendation Functions

Notebook can be viewed here.

We built four functions in total, briefly summarized below:

User Input Conversion

Meant to take in responses from a live user and convert into usable metrics.

Cosine Similarity for Single Recommendation

Finds the single location with the nearest cosine similarity and returns as a recommendation.

Cosine Similarity for Multiple Recommendations

Finds the five locations with nearest cosine similarity and returns all five in recommendation.

KMeans Cluster for Randomized Recommendation

Uses KMeans to group the locations, then returns a random recommendation from the list of locations that were in the same cluster.

We selected the user input conversion and single-recommendation cosine functions to use in our Streamlit web app.

Conclusion and Next Steps

Conclusions

Combining multiple datasets comes with challenges

Although we made sure to pull data from only reputable sources, a few of our datasets were given on a neighborhood scale, while others were on a city-level scale. This is not inherently an issue, but it does slightly diminish the interpretability of our findings and creates significantly more work in validating.

There are likely patterns in the data that were overlooked

Due to the timeline and scope of the project, we weren't able to explore each of the datasets to the degree that we would have liked. Given that we were able to identify a probable regional trend in our initial EDA, it appears likely that there are more patterns that exist within the data and could be used in future versions of our function.

The application works, but could be improved

While we were able to build an application that provides location recommendations, we do feel that there are a few changes (detailed in Next Steps section below) that could be made to increase the likelihood of repeat traffic from users.

In all, we would consider the problem statement to be mostly met, but with a few necessary revisions before presenting to the client and going to market.

Next Steps

Added locations

Ideally, we would be recommending various cities in every state and eventually, in international locations as well.

NLP on Airbnb listing names and descriptions

Run the words used in each listing name and description through Count Vectorizer/Tfidf, and from there, generate a list of selected words the user can choose from that match their preferences.

More data

The typical data science response of: more information needed. Specifically around local communities and activities (e.g. music scene if user wants to live somewhere that has multiple venues or more opportunities to see live music), nightlife, population diversity, etc. All of this would assist us in providing more tailored recommendations.

Web Application

To take a look at the finalized version of our web app for Airbnb, please click here.

Presentation

See here for a brief, fairly non-technical presentation summarizing our data collection process, exploration, and final recommendation system.

Works Cited

Please see here for an exhaustive list of resources used in this project.

cynthia-rodriguez1/wfh-location-recommender

Remote Work Location Recommender

Table of Contents

Executive Summary

Data Dictionaries

Data Acquisition and Cleaning

Datasets:

Exploratory Data Analysis

Building Recommendation Functions

Conclusion and Next Steps