Localebnb: An Airbnb Contexual Recommendation App
The motivation for this project was: When booking a private residence, how do you find the perfect neighborhood?
It stemed from my personal frustrations with Airbnb's search functionality while booking in Montreal. I knew that I wanted to stay in a "trendier" neighborhood, but away from tourists & nightlife. While I could search & filter Airbnb's search results by neighborhoods to stay, I had no idea what neighborhoods met my criteria!
Localebnb aimed to be that contextual recommender for Airbnb.
Using Airbnb listing descriptions (features) + Airbnb's neighborhood guides for traits (target ), I built an app that predicts whether a listing is in a neighborhood with a specified trait, and then I use that information to score & re-sort the default search results provided by Airbnb.
##How to Use
Note: it is best to use this app on desktop with a large window
- Go to the Localebnb app - note: this project is no longer live as of late 2015...
- Enter in your search criteria (city, dates, guests), as well as neighborhood trait preferences ('is artsy, 'has shopping', etc)
- Click "Search Airbnb" - this scrapes Airbnb's search results & listings, predicts the traits for each listing, then scores & re-sorts the search results
- On the search result page, you can resort by the column header. You can also change your preferences and see how that changes the search results.
- If you hover over a listing, a pop-up appears in the map. You can click to the Airbnb page of the listing, as well as additional information about the listing's description
I scraped 4 types of pages across Airbnb for data:
- Search Result Pages (e.g. https://www.airbnb.com/s/Portland--OR--United-States?checkin=09%2F18%2F2015&checkout=09%2F21%2F2015)
- ~4000 Listing Pages (e.g. https://www.airbnb.com/rooms/14584)
- City Guides for SF & NYC (e.g. https://www.airbnb.com/locations/san-francisco)
- Neighborhood Guides for all neighborhoods(https://www.airbnb.com/locations/san-francisco/duboce-triangle)
I mapped listings to neighborhoods & neighborhoods to traits to come up with my labeled dataset (listings -> traits). I then cleaned up the description using NLP techniques, vectorized the description using TF-IDF, and used a variety of models on this information. SVM's provided the highest accuracy (~78-82%, a 5 pt lift over a naive bayes model). Interestingly enough, when attempting to create a 'majority vote' ensemble (NB + SVM + Random Forest), the accuracty decreased slight against each of the individual models. This denotes that each of these 3 models are able to pick up certain features that neither of the other 2 are able to.
I also ran a Doc2Vec (Word2Vec) model using the cleaned descriptions as sentences & the neighborhood traits & cities as label. However, due to the size of my corpus, this data proved insufficient for for use in Localebnb. With a much larger training set, I'd love to revisit this method.
There are many applications for this data & methodology.
Why Airbnb should implement this:
- User Value: Increase user satisfaction by increasing relevance
- Business Value (revenue): Increase booking rate by reducing bounces (& click fatigue)
- Business Value (content team): Guide creation of neighborhood guides in new cities
- ***Word2Vec Bonus note: The inclusion of a trait model for search results would need to be tested against existing systems. The potential negatives may include the increase of options (i.e. the paradox of choice) and/or the contextualized search lowers the costs of the listings that people book at.
reference:
- https://www.airbnb.com/support/article/39
- http://nerds.airbnb.com/location-relevance/
- http://nerds.airbnb.com/host-preferences/
- Scrape more descriptions across more cities beyond SF & NYC (as neighborhood names & major street names were highly predictive in most models)
- Include additional listing information in models
- Make neighborhood traits more fluid by giving partial weight to nearby neighborhoods (utilizing graph analytics)
- Revisit Doc2Vec model on a larger corpus & potential applications of Doc2Vec
- iPython & iPython Notebook - IDE for python; used to test code snippets & explore data
- MongoDB - a NoSQL database; used for storing my scrapes
- pymongo - A python wrapper for MongoDB
- Requests - A python library used in scraping tasks for getting webpage html.
- BeautifulSoup - A python html-parsing library. It makes it much easier to pull out particular elements from a complex webpage.
- pickle - A python library for serializing objects; used for saving requests objects for later parsing
- time & datetime - Python libraries for time related functions; used for logging times of scrapes & pausing the scripts between scraping
-
- A python library for datetime related functions; used for parsing datetime objects in Pandas
- pandas - provides high-performance, easy-to-use data structures and data analysis tools for Python; used for basic data manipulation & some file reading
- NumPy - the fundamental package for scientific computing with Python.; used for math functionality
- scikit-learn - data modeling library
- nltk - library for NLP
- Flask - a python framework for creating web apps.
- [gensim's Word2Vec & Doc2Vec](* Word2Vec - [https://radimrehurek.com/gensim/models/word2vec.html]) - a deep learning modeling library to help discern the definition of words. while not included in the final app, some EDA & testing was used with this model. With a larger corpus, it's likely that a Doc2Vec model would be used.
- moz - the Google SERP CTR by position served as an inspiration & starting point for my scoring system.
Also, Galvanize (a.k.a. Zipfian Academy) & its instructors for an amazing education.
A special thank you (/slash/ apology) to Airbnb, whose amazing service was an inspiration for this project. I hope you are inspired by Localebnb to explore include neighborhood description search/filtering functionality in your search
-G Scott Stukey