Financial-Times/coronavirus-excess-mortality-data

Missing States in USA

Opened this issue · 14 comments

TRM13 commented

I downloaded the CSV file and was going over it and noticed a bunch of states in the USA missing:

Alabama,Alaska,Arizona,Arkansas,California,Connecticut,Delaware,Florida,Georgia,Hawaii,Idaho,Indiana,Iowa,Kansas,Kentucky,Maine,Minnesota,Mississippi,Missouri,Montana,Nebraska,Nevada,New_Hampshire,New_Mexico,North_Carolina,North_Dakota,Ohio,Oklahoma,Oregon,Pennsylvania,Rhode_Island,South_Carolina,South_Dakota,Tennessee,Texas,Utah,Vermont,Virginia,West_Virginia,Wisconsin,Wyoming

If it is "total deaths" you want and a baseline you could try the "Pneumonia and Influenza" page which has downloads. I've done that and parsed it out using a 4 year average instead of 5 like you but similar idea.

For 2020 the USA has an "excess death" rate about 5.5% (50,331) higher than the previous 4 year average for weeks 1 to 16. As a comparison I checked the first 16 weeks of 2018 compared to the previous 4 year average and it was 7.2% (63,260).

The script and all related files are here if you want to kick the tires:
https://www.dropbox.com/sh/fh9x5fngmfbeiiu/AAAH-OtOMqiY_R9qqG6YccCRa?dl=0

Also came here looking for North Carolina -- the NCHS dashboard has "insufficient data" for the NC map but (1) also Ohio shows this & Ohio is included in the repo data and (2) a data download from that page seems fine...

TRM13 commented

North Carolina finally got past the 20% limit to be included. I'm sure there will be more updates to follow. I also did a first grab of weeks 1-21 and it is in at 6.8% but 8 weeks are missing. West Virginia (1), Connecticut(3) and of course North Carolina(4). LOL.

I get it from the CDC site: https://gis.cdc.gov/grasp/fluview/mortality.html

It says "Pnuemonia and Influenza" but it does have a "Total Deaths" which is what I was looking for. I gave up on trying to get stuff off the various state web sites. All are laid out differently, with different years available. It was a total mess. Hard to make good decisions with bad data.

TRM13 commented

Your 71,700 up to week 17 sounds about right. Up to week 21 I have it at 79,816 so close and with 8 more weeks of data still to be posted it will undoubtedly go up.

It would be interesting to see what excess deaths there are in other countries. I have enough trouble just finding things on the CDC site and it's in English (my native language). LOL.

Thanks for the feedback.

TRM13 commented

Good work. I hope the Financial Times takes you up on your offer. You could put your data & scripts on Dropbox or here on Github when you are ready.

The one change I would make to your method is in regards to this:
"calculate the mortality of each country from the week the first Covid deaths were reported"

That would miss early cases where it could be confused with influenza. In the case of the USA I was probably in circulation in Jan/Feb but the CDC discouraged testing during that period. The athletes returning from the World Military Games most likely brought it back in early November.

I start mine at Jan 1st because that is where the CDC starts their yearly mortality statistics from. It will make comparisons to previous years better. Unlike the pneumonia and influenza deaths that they list from week 40 to week 39 the following year.

Tracking month by month is an interesting idea. That could be a way to track any "second wave" effects (although due to lockdowns it is really just a delayed first wave).

TRM13 commented

Now you got me going down the rabbit hole again. Love it. Thanks.

After you mentioned it, I think the month by month statistics could be very handy so I did them up. It will also be very useful for seeing if the re-opened states are having a second wave or not.

In short January and February were pretty much in line with the previous 4 year average. March surprised me and the huge hit appears to be in April. Looking forward to weeks 19-22 (May).

Using the same data I downloaded on 2020-06-05 I ran it for each month. The CDC tracks by week not month so it is a bit of a kludge but these are the weeks that approximately correlate to the months:
Weeks 1 to 5 ~= January (Entire USA was -0.8%)
Weeks 6 to 9 ~= February (Entire USA was +0.3%)
Weeks 10 to 13 ~= March (Entire USA was +3.9%)
Weeks 14 to 18 ~= April (Entire USA was +28.7%)

After that the data is still too incomplete to be of use and even week 18 is missing too much for my liking.

Interestingly weeks 10 to 13 (March) show very little except for New York City at +49.7% and only a few others breaking double digits (Montana=10.1 ; South Carolina=11.7 ; Louisiana=11.7 ; New Jersy=12). Also noteable were the states well below the 4 year average (Connecticut= -12.6 ; Pennsylvania= -19.2%). Pennsylvania was -14, -29 and -19 for the first 3 months (I got to look into that).

Weeks 14 to 18 (April) is where the biggest hit took place.
New York City = +415.9%
New Jersey = +169%
New York State = +87.7%
Massachusetts = +85.5

I'll update my dropbox with the monthly script & data once I check it again and clean it up this weekend. In the mean time I've attached the reports spreadsheet for the monthly stats.
reports.xlsx

https://www.dropbox.com/sh/fh9x5fngmfbeiiu/AAAH-OtOMqiY_R9qqG6YccCRa?dl=0

TRM13 commented

It was a reference to "Alice's Adventures in Wonderland" by Lewis Carroll. He was a brilliant mathematician as well as a writer. A phrase used when weird stuff starts happening. Seemed appropriate.

I like the time slice idea and the finest granularity we can get is weekly so I'll run those. As of this morning only North Carolina is missing week 18. To get to the end of May we'll need 22 weeks and a lot is still missing from those.

Thanks and TTYL

TRM13 commented

I'm going to do up a "Weekly" to go along with the "Year To Date" and "Monthly". That is the finest granularity I can get with the CDC data and it should be fine. I don't think we'll see much difference from the monthly but it will be interesting to see how the supposed "second wave" works out. In reality it is the first wave just delayed by shutdowns.

Looking at things across different time spans is useful. For the overall effect the full year statistics are good and to see when things happened and how fast the monthly/weekly are good.

As luck would have it I'm quite busy for the next 7-10 days but given that the data isn't complete past week 17 it shouldn't really matter. Once a month updates should be fine.

TTYL

TRM13 commented

I don't work for FT or anyone else. I just reported an error on selection in their report in this thread and we sort of hijacked it. LOL.

Specific causes of death won't be available for 2-3 years from the CDC unfortunately. It would be interesting to see how the health care workers are fairing.