Parse (historical) alerts from water treatment companies.
The aim is to provide structured data that could be later used to make dashboards and visualisations.
To date, only Odyssi from Martique is handled. Contributions are welcomed.
- python3.8
mkvirtualenv water-parser
Download one (or all) page(s) from Odyssi.
$ wget --no-check-certificate https://www.odyssi.fr/coupure/2077 -O /tmp/2077
Run the script
$ water_parser --input_file /tmp/2077 --print
period_from period_to reason
2020-05-08 2020-05-08 casse
With the option --output_dir
, a csv per period and a csv per day are written. Example:
$ water_parser --input_file /tmp/2077 --output_dir /tmp
$ ls /tmp/odyssi_*
/tmp/odyssi_days.csv /tmp/odyssi_periods.csv
In the directory html
, a vega-lite visualisation loads csv data (day
format) and plot a <month, day>
heatmap for the whole period (since 2013).
Heatmap can be exported to png or svg formats.
One drawback is when multiple failure types occur one day, only one is kept.
- Extract locations from incident pages. Not so trivial as there are lots of errors, with different granularities and potentially ellipsis (lists ending with
...
) - create a shortcut to retrieve (and parse) one page, the last
n
pages or the pages since a provided pageid
- Identify if the same data can be extracted from the smeaux
- Generalize the parser to be able to handle easily more companies