Quick script to scrape the Christchurch City Council cycling data from https://smartview.ccc.govt.nz/map/layers/ecocounter
The data directory contains a sample of the cycle count data.
TODO: The data may be updated nightly and made available on GCP in parquet and duckdb formats.
To run the scraper script install poetry and Python3.11+.
From a terminal, install the dependencies, enter the virtual environment, and run the main
script with:
poetry install
poetry shell
python main.py
- Python script to get the data and make available as a CSV/parquet file.
- Run nightly (GCP Build Job, Cloud Task) and upload data somewhere public.
- Metabase Dashboard (Add people who care as Admin)
Live version of this graph
After you have run the main.py
script you will have a local duckdb database with current data that
you can query directly.
You can also download mine from https://storage.googleapis.com/hardbyte-ccc/duck.db or, query the parquet data which I commit to the repository.
Install duckdb - https://duckdb.org/docs/installation/index and start a DB console with:
duckdb duck.db
Some example SQL queries:
$ duckdb -s "select name, value from 'data/cycling-counters.parquet' where date = today() order by value desc limit 10;"
┌──────────────────────────┬───────┐
│ name │ value │
│ varchar │ int32 │
├──────────────────────────┼───────┤
│ Eco Display Total Counts │ 1956 │
│ South Hagley Park │ 1821 │
│ North Hagley Park │ 1619 │
│ Railway Cwy │ 904 │
│ University Dr │ 891 │
│ St Asaph St │ 804 │
│ Antigua Street │ 759 │
│ Collins Street │ 702 │
│ Main Rd total cyclists │ 552 │
│ Cashmere Rd │ 550 │
├──────────────────────────┴───────┤
│ 10 rows 2 columns │
└──────────────────────────────────┘
select
sites.name, cycling_counts.date, cycling_counts.value
from
sites, cycling_counts
where
cycling_counts.site = sites.oid
and current_date - cycling_counts.date < 30
Ref https://duckdb.org/docs/sql/window_functions
select "Name", "Date", AVG("value") OVER (
partition by "site"
order by "Date" ASC
RANGE BETWEEN INTERVAL 3 DAYS PRECEDING
AND INTERVAL 3 DAYS FOLLOWING)
AS "7-day Moving Average"
from
sites, cycling_counts
where
cycling_counts.site = sites.oid
order by "Name", "Date"
Duckdb can output to csv/parquet and much much more. Ref https://duckdb.org/docs/sql/statements/copy
COPY (
SELECT
site, date, value, sites.name, sites.x, sites.y
FROM sites, cycling_counts
WHERE
cycling_counts.site = sites.oid
) TO 'data/cycling-counters.parquet' (FORMAT PARQUET);