tomwhite/covid-19-uk-data

Auto-updating datasette instance

tomwhite opened this issue · 8 comments

It would be great if we could have a datasette instance that was automatically updated (e.g. every hour).

One way of doing it: https://twitter.com/psychemedia/status/1243222423287271424

The three best options for this are:

All three should be essentially free for a small project like this. Which one are you most comfortable with?

I usually use Cloud Run for this kind of project: https://simonwillison.net/2020/Jan/21/github-actions-cloud-run/

I've used Heroku in the past, but have most experience with Google Cloud - so probably Google Cloud Run. I saw https://simonwillison.net/2020/Jan/21/github-actions-cloud-run/ which has lots of useful details, but I was hoping it might be a bit simpler if there's no postprocessing to do.

This repo already has a sqlite database (https://github.com/tomwhite/covid-19-uk-data/blob/master/data/covid-19-uk.db) which is updated whenever the data changes, so it might be simplest to just publish that. Alternatively, it should be straightforward to generate a sqlite database from the CSV files in the data directory.

My preference is to automate the building of the Database in the action (or CI script or whatever) - here's an example of a build script I wrote that uses csvs-to-sqlite for that: https://github.com/simonw/global-power-plants-datasette/blob/ece947cb869e2786fae5a6a6316ac1a77430cbdf/.travis.yml#L11

If you have a separate mechanism for building the SQLite database then you can skip that though and just run datasette publish against the .db file you've already created.

That makes sense. I actually already use your csvs-to-sqlite in the processing pipeline for preparing the data in this repo :)

So I think the way forward is to use Google Cloud Run triggered by a GitHub Action, just like you did in your blog post. I will try to work through it in the next couple of days. Thanks for your guidance!

I'm happy to help review your action YML file as you work on it - I find they usually take quite a bit of iterating to get them working. Setting up the secrets for Cloud Run is particularly fiddly in my experience.

Hi @simonw, I've managed to set up a GitHub Action that publishes a datasette instance on Cloud Run. It was quite fiddly, but the instructions you published at https://simonwillison.net/2020/Jan/21/github-actions-cloud-run/ were invaluable - thank you for documenting the details so clearly!

I made a few notes of extra things that I thought were worth mentioning:

  • I had to enable the Cloud Build API on Google Cloud. I discovered this after the GitHub workflow failed, and the error message suggested I do this.
  • I had to allow unauthenticated invocations of the cloud run service. I discovered this by looking at the cloud run logs. The command suggested in the GitHub Action log was:
    • gcloud beta run services add-iam-policy-binding --region=europe-west1 --member=allUsers --role=roles/run.invoker covid-19-uk-datasette
  • I added the decide_variables step later, once I knew the cloud run URL (a bootstrapping problem - I’m not using cloudflare).
  • I'm not using scheduled builds since I want the build to happen every time I commit to the repo.

It's running at https://covid-19-uk-datasette-65tzkjlxkq-ew.a.run.app/

The PR for this is #26, in case you've got any comments. Thanks!

Fixed in #26