Auto-updating datasette instance
tomwhite opened this issue · 8 comments
It would be great if we could have a datasette instance that was automatically updated (e.g. every hour).
One way of doing it: https://twitter.com/psychemedia/status/1243222423287271424
The three best options for this are:
- Heroku
- Google Cloud Run
- https://fly.io
All three should be essentially free for a small project like this. Which one are you most comfortable with?
I usually use Cloud Run for this kind of project: https://simonwillison.net/2020/Jan/21/github-actions-cloud-run/
I've used Heroku in the past, but have most experience with Google Cloud - so probably Google Cloud Run. I saw https://simonwillison.net/2020/Jan/21/github-actions-cloud-run/ which has lots of useful details, but I was hoping it might be a bit simpler if there's no postprocessing to do.
This repo already has a sqlite database (https://github.com/tomwhite/covid-19-uk-data/blob/master/data/covid-19-uk.db) which is updated whenever the data changes, so it might be simplest to just publish that. Alternatively, it should be straightforward to generate a sqlite database from the CSV files in the data directory.
My preference is to automate the building of the Database in the action (or CI script or whatever) - here's an example of a build script I wrote that uses csvs-to-sqlite
for that: https://github.com/simonw/global-power-plants-datasette/blob/ece947cb869e2786fae5a6a6316ac1a77430cbdf/.travis.yml#L11
If you have a separate mechanism for building the SQLite database then you can skip that though and just run datasette publish
against the .db file you've already created.
That makes sense. I actually already use your csvs-to-sqlite
in the processing pipeline for preparing the data in this repo :)
So I think the way forward is to use Google Cloud Run triggered by a GitHub Action, just like you did in your blog post. I will try to work through it in the next couple of days. Thanks for your guidance!
I'm happy to help review your action YML file as you work on it - I find they usually take quite a bit of iterating to get them working. Setting up the secrets for Cloud Run is particularly fiddly in my experience.
Hi @simonw, I've managed to set up a GitHub Action that publishes a datasette instance on Cloud Run. It was quite fiddly, but the instructions you published at https://simonwillison.net/2020/Jan/21/github-actions-cloud-run/ were invaluable - thank you for documenting the details so clearly!
I made a few notes of extra things that I thought were worth mentioning:
- I had to enable the Cloud Build API on Google Cloud. I discovered this after the GitHub workflow failed, and the error message suggested I do this.
- I had to allow unauthenticated invocations of the cloud run service. I discovered this by looking at the cloud run logs. The command suggested in the GitHub Action log was:
gcloud beta run services add-iam-policy-binding --region=europe-west1 --member=allUsers --role=roles/run.invoker covid-19-uk-datasette
- I added the
decide_variables
step later, once I knew the cloud run URL (a bootstrapping problem - I’m not using cloudflare). - I'm not using scheduled builds since I want the build to happen every time I commit to the repo.
It's running at https://covid-19-uk-datasette-65tzkjlxkq-ew.a.run.app/
The PR for this is #26, in case you've got any comments. Thanks!