As someone with elderly parents, I would like to know when https://houstonemergency.org/covid-19-vaccines/, or any other vaccine info web page, is updated so that I can tell my parents to go sign up for a vaccine.
- build vs. buy: Googling website monitoring pages results in only services that only monitor at intervals on the order of tens of minutes or hours. Build because it's easy, simple and I want monitoring on the order of 1 minute.
- We can interpolate epoch date to save versions of the page. eg.
echo "haha" >>
date +%s.txt
- The page is ~14k according to
curl -so /dev/null https://houstonemergency.org/covid-19-vaccines/ -w '%{size_download}'
- Run on VM or in container. This should cost just a few dollars/month since it's such a lightweight process.
- Don't use persistent datastore b/c #bvp
- Save html contents of https://houstonemergency.org/covid-19-vaccines/ to memory.
- Wait 60s
- wget https://houstonemergency.org/covid-19-vaccines/ again and compare to previous version
- Save latest version to memory and repeat
- progress
- POC working:
python main.py debug
to do 3 checks w/o checkspython main.py
for infinite looping with ~60 seconds between loops
- POC working:
- Wild guess is that this is cheaper than BVP b/c we're using a lambda function, but having a persistent datastore could add cost.
- Save https://houstonemergency.org/covid-19-vaccines/ as
date +%s_index.html
and asprevious_index.html
- Each time we scrape the page, we diff the latest results with
previous_index.html
and then overwriteprevious_index.html
with the latest - If there's a diff, send notification to as defined by config file or environment variable. Email is easy.
- Set this up as a cron running during normal waking hours 05:00 - 01:00 (b/c we're talking about a covid vaccine)
- Detect if an anchor tag has been added and send the URL in the notification