Where can I book a freaking tennis court in Raincouver? -- Tennis Buddha
This project is to answer the universal question in one place. To rephrase it, let's build a webpage to display the court vacancies in Vancouver.
Current supported venues are
- Burnaby - Burnaby Tennis Club
- Richmond - Tennis BC Hub
- Vancouver - UBC
- North Vancouver - Tennis Centre
- Coquitlam - The Tennis Centre
- Surrey - The Tennis Centre
- Langley - The Tennis Centre
- Parse booking page from Bnb - BTC
- Parse booking page from Coq - TTC
- Parse booking page from Surrey - TTC
- Parse booking page from Langley - TTC
- Parse booking page from Rmd - Hub
- Parse booking page from Van - UBC
- Parse booking page from North Van
- Define a data storage format (JSON)
- Build a HTML page to load data and render (mobile friendly)
- Deploy somewhere and get HTML exposed
- Add a background job to update data every five minutes
- Error handling
- Enable network retry
- Collapsed views by default
- Build a query interface to ask for vacancies on specific dates
High level
bin/run
is the script to scrape venue websites and store data intorunner-data.json
. It's configured to run every 5 minutes by Github Actionindex.html
is the static page that loads and renders the data. It's hosted by Github Pages
Details
- Scraping strategy: Prefer simple data endpoint request over login and page parsing. So, try figuring out data endpoint firstly, otherwise use mechanize to parse page, otherwise employ Selenium Web driver to enable JavaScript simulation.
- Data is actually stored in
runner-data.js
instead ofrunner-data.json
to avoid CORS check, which requires JSON file to be loaded from a server. - Data is mostly massaged in back end and served directly to front end.
- Data is sorted by
(date, start_time, end_time, court_info)
- mechanize is the main scraping framework used.
Others
- Secrets are managed by direnv.
- Use
record: :new_episodes
to record new VCR requests
- Add a new entry to
venues.json
- Add new scraper in
lib/new_venue_scraper.rb
with testtest/new_venue_scraper_test.rb
Github Actions
The shortest interval you can run scheduled workflows with Github Actions is once every 5 minutes. But there is no guarantee on the frequency. In practice, I noticed it's around 10 minutes on average.