A series of fixes for Data originating from Northern Ireland
This is very much a work in progress and I'll only really add to it as things annoy me, but if you have a data set that's messy, and you can use similar workflows as demonstrated in this repo, please feel free to use this and PR the crap out of it.
I make no claims of liability, credit, or property over any of this, and make no promises that anything will ever work or be fixed.
Source: https://www.nisra.gov.uk/publications/weekly-deaths
- Weekly death registrations in Northern Ireland, from 2009 - 2020
- Also includes tracking of 2020 COVID outbreak
- The Usual Boilerplate (i.e. non numerical footers, non-merged multi row headers, overly complex column headers, inconsistent structure)
a typo in the 2014 Week Start date (Which actually inspired me to make this repo in the first place)(This was fixed)Then the updates went into new files rather than reusing the same URLs, which makes this whole thing more of a pain(This was also fixed)
from tornamona.fixes import nisra
dataset = nisra.WeeklyDeaths().get().clean()
dataset.data.head().to_markdown()
Week | Week Start | Week End | Total Deaths | Average Deaths for previous 5 years | Min 5 year deaths | Max 5 year deaths | Respiratory Deaths | Average Respiratory Deaths for previous 5 years | COVID19 Deaths | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2008-12-27 00:00:00 | 2009-01-02 00:00:00 | 373 | 332.4 | 309 | 364 | nan | nan | nan |
1 | 2 | 2009-01-03 00:00:00 | 2009-01-09 00:00:00 | 454 | 329.2 | 302 | 377 | nan | nan | nan |
2 | 3 | 2009-01-10 00:00:00 | 2009-01-16 00:00:00 | 388 | 310.2 | 290 | 340 | nan | nan | nan |
3 | 4 | 2009-01-17 00:00:00 | 2009-01-23 00:00:00 | 402 | 324 | 281 | 367 | nan | nan | nan |
4 | 5 | 2009-01-24 00:00:00 | 2009-01-30 00:00:00 | 353 | 305.6 | 272 | 333 | nan | nan | nan |
Everything that has ever pissed me off about open data
This package was created with Cookiecutter
and the audreyr/cookiecutter-pypackage
project template.