This is a demo project and template repository showing how I use SpatiaLite with Datasette for quick spatial analysis.
Live show notes: https://docs.google.com/document/d/1z30ZCWGsR_jr6EuHKukJ4kTZQMBryrrEkIgo_5twZjo/edit
SQL is old. It's boring. It's widely adopted and not controlled by a company that might change it or shut it down next year.
SQLite is a database in a single file. It's fast and durable. It's easy, both in code and in my head, to make and delete and remake lots -- hundreds -- of little databases. They're just files.
There are a lot of shiny, brittle things out there. This is cast iron coding.
- datasette is open source multi-tool for exploring and publishing data
- sqlite-utils is a suite of utility functions for manipulating SQLite databases
- geojson-to-sqlite loads GeoJSON into a SQLite database
- datasette-geojson adds GeoJSON output to Datasette
- datasette-geojson-map renders a map for geographic queries
- sqlite-colorbrewer adds a
colorbrewer
function to SQLite
Run pipenv install
to create a virtual environment and get the latest version of everything. I also recommend installing SpatiaLite.
These libraries aren't included in the Pipfile
but could be useful:
- geocode-sqlite will run a geocoder on every row in a table, saving the results to
latitude
andlongitude
columns. - datasette-leaflet-geojson renders a map for each feature row.
- datasette-cluster-map will cluster points for rows with
longitude
andlatitude
columns. This won't work with datasette-geojson-map, so choose one or the other. - census-map-downloader to grab Census boundaries.
Over the past year, I've started using this set of libraries to quickly explore, reshape and reformat data for visualization in news projects. My workflow usually involves three phases: loading, exploration, export.
This stack isn't (yet) a replacement for R, or GeoPandas or QGIS. Those are good tools, and if you're more comfortable there, I'm not here to talk you out of them.
This approach is, for me, faster than QGIS for most datasets. It allows me to more quickly iterate on finding interesting subsets (because I can write SQL faster than I can navigate a GUI), and it allows me to see intermediate results, because each query generates a map.
For self-contained datasets, I use geojson-to-sqlite. (For shapefiles, I usually transform data to GeoJSON first using ogr2ogr
, but shapefile-to-sqlite is also available.)
For tabular data, I use sqlite-utils
.
If you are new to Datasette, this tutorial is a good place to start.
Once you have data in SpatiaLite, every SQL query that includes a geometry
column will now include a map. (If you're using leaflet-marker-cluster
, you'll see a map if your query includes longitude
and latitude
columns.) Use queries to see different slices of the data.
Here's a map of counties in Massachusetts.
Using datasette-geojson-map
lets you style map features using Mapbox's simple-style-spec. Github also supports this for GeoJSON and TopoJSON files.
With sqlite-colorbrewer
, it's possible to color features based on data.
Here are countries in Asia, in color.
Combine those, and we can start using Datasette for exploratory visualization:
Finally, let's get data out of exploration mode and into something more presentable.
This stack is built for reporting, not for storytelling.
For interactive maps, I recommend exporting a version of your data as GeoJSON and building static versions for presentation with Leaflet or Mapbox. For more advanced visualizations, look at d3-geo.
Datasette allows queries to run from the command line using the --get
option, which will hit any Datasette URL and return the result. Combine this with canned queries for scripted exports. I usually put this in a Makefile:
combined.geojson: project.db
pipenv run datasette project.db --get /project/combined.geojson \
--load-extension spatialite \
--setting sql_time_limit_ms 5000 \
--setting max_returned_rows 20000 \
-m metadata.yml > $@
That assumes I've loaded whatever I need into project.db
and created a canned query called combined
(and then uses datasette-geojson
for geojson output).
My goal in this step is to end up with a file that won't need any further filtering or processing once I pull it into a frontend tool.
Here are static versions of the examples above:
- US Census data from NHGIS or Census Reporter
- All the Places
- US Drought Monitor
- Earthquakes
- Wildfires
- Flood hazards
- California Protected Lands - More information here
- Central Park Squirrel Census
- Manatee Carcass Recovery Locations in Florida