covidatlas/li

scraped NY Times data flawed

Closed this issue · 5 comments

Original issue https://github.com/covidatlas/coronadatascraper/issues/978, transferred here on Thursday May 07, 2020 at 14:43 GMT


US county data differ from those in the New York Times source file (https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv).

E.g. Providence County, Rhode Island:
2020-04-29 - 3431 cases in your data
2020-04-29 - 5967 cases in NY Times data

I don't know if other counties are affected as well.

(Transferred comment)

Thanks for the issue!

We scrape multiple sources and cross check them. It’s possible that
another source took precedence over the NYT one.

Is this still occurring? Cheers! Jz

El El jue, may. 7, 2020 a la(s) 10:44 a. m., hannahklauber <
notifications@github.com> escribió:

US county data differ from those in the New York Times source file (
https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv
).

E.g. Providence County, Rhode Island:
2020-04-29 - 3431 cases in your data
2020-04-29 - 5967 cases in NY Times data

I don't know if other counties are affected as well.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/covidatlas/coronadatascraper/issues/978, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AAMPWDOET2EEZQD26UT4DYDRQLCLPANCNFSM4M3MM2OQ
.

(Transferred comment)

Thank you for building up this great database!

The issue is still occurring.

Best, Hannah

(Transferred comment)

It's possible that RI is reporting current, not cumulative? Because their own website very clearly says 3,913 for Providence... which is less than yesterday, wtf? https://ri-department-of-health-covid-19-data-rihealth.hub.arcgis.com/

(Transferred comment)

Reached out to RI, they said;

Good morning,

Thank you for reaching out. The data is updated every day and cumulative.

Best,

Isabella
COVID-19 Joint Information Center

With that, it does seem that NYT and JHU are wrong, or are counting data differently somehow... Maybe it has to do with RI reporting at a city level for some places?

(Transferred comment)

This is a common NYT problem ... they're counting higher for other locations too. Perhaps we shouldn't use them.