cmu-delphi/www-covidcast

County-level choropleth map for hospitalizations shows no data available

Closed this issue · 8 comments

Now that we have closed the county-level hospitalization signals issue #1164, which I see was merged in #1182, we should be seeing the county-level data on the main dashboard.

However, here: https://delphi.cmu.edu/covidcast/indicator/?sensor=hhs-confirmed_admissions_covid_1d_prop_7dav&date=20220620 (screenshot below)

Screen Shot 2022-06-23 at 8 19 11 AM

the choropleth map shows no data available. This missingness bug seems to propagate down to the state level views too (which should show all counties therein):

Screen Shot 2022-06-23 at 8 20 34 AM

Ahhhhhh OK so I think I've figured it out. The last county-level data (CPR) is apparently available on June 15. Whereas the last state and national level (HHS) is apparently available on June 21.

Screen Shot 2022-06-23 at 8 23 15 AM

Screen Shot 2022-06-23 at 8 22 28 AM

Sorry fro the monologue here. As I was writing the issue, I figured out what was happening. So it brings me to one question, and one suggestion.

  1. Is this 6-day difference normal or is it longer than usual? I thought the normal difference was shorter.
  2. Instead of simply saying "No data available" in a plot (like a grayed out choropleth map), can we amend that to say "No data available after XYZ", where XYZ is the last calculated availability for that signal, and even better, is hyperlinked to take you back to the same page, but on that day?

attn @duanecmu @sgratzl cc @krivard since you guys have all been involved in this county-level view topic before.

It's not typical; CPR didn't update yesterday. @neul3 will know whether that was a bug on our end or if DSEW didn't upload the report on time.

CPR did update today (around noon); as of my writing this at 2022-06-23 14:07 you can see county-level data up to the 19th and state-level data up to the 21st, which is a more typical difference.

I like the idea of a direct link to the most recent availability for the selected signal. Unfortunately the easiest way to do this would be with metadata, which is still only computed nightly. For CPR data, that could mean being up to 4 days different from the actual most recent availability, e.g. on a Tuesday afternoon after a federal holiday 3-day weekend.

Oh noooo this is about to get worse: One of the updates published on healthdata.gov yesterday added this text:

"Effective June 22, 2021, the Community Profile Report will only be updated twice a week, on Wednesdays and Fridays."

I expect they meant 2022 -- this is the final line in a sequence of "Effective $DATE, ..." phrases that go:

  • April 2021
  • June 2021
  • August 2021
  • June 2021 --> probably 2022

We do have interpolation running on hospital admissions, but we don't do any forward projections. That means we'll have varying lags for the county-level data, ranging from 2 days on a Wednesday or Friday afternoon, to 7 days on a Wednesday morning.

Thanks. Too bad about the last thing you raised. Since forward projections are going to be tricky (essentially nowcasting/forecasting), a potentially simpler solution is as follows:

When a signal is not available at the current date but is available <= N days ago (where N is some number we choose, like 7 or 14), then we:

  • print the most recently available signal values for the numeric displays
  • plot the most recently available signals values in beehive grads or choropleth maps

And we put a visible asterisk and a clause stating that we are displaying data for XYZ (where XYZ is the date of the most recent available data).

This is consistent with what we're already doing in the indicators table. When we went to v3, I suggested to Sam that we do this and he implemented it (though I think N here is just set to the number of days in the sparkline display). Screenshot below.

Screen Shot 2022-06-23 at 2 55 02 PM

We could continue discussing through GitHub, but an alternative suggestion is that you could convene a small group of people (probably including Sam on the viz/frontend side but also including somebody on the data side like Dmitry or Logan) to discuss this options and make a proposal. I think this idea is most acutely needed for the county-level hospitalizations, but could be applied across the board. Thanks!

When a signal is not available at the current date but is available <= N days ago [...]

I'm happy with this if @sgratzl is -- sam is this reasonable or should we set a time to discuss further?

so clarifying points from what I understand:

  1. when you navigate to a signal + location (identified by its level) + date for which we don't have data yet (i.e. meta data says data > max date), when we show a banner at the top, explaining it with a link to jump to the latest date.

would be consistent with the selection you made at the top. Otherwise, you navigate in time but the data doesn't change and is "stuck" at the latest available date.

  1. the tricky part is as in the pictures above since for one signal we are showing different signals for different levels which can lead to different max dates for different levels, e.g. geo=ca + choropleth => will show counties within ca but states for the rest of the map and thus different signals.

mixing dates for the same visual signal (even they are technically two) in charts that only show one date (like the choropleth) map, might cause confusing.

In the signal table it works since for every signal you have the time-series next to it, which are having a visual indicator which date is showing. Moreover, they are different signals which users can expect more inconsistencies.

So I guess we're going the discuss on GitHub route 🙂

Re your first point: yes that sounds good. That would solve at least some of the issues. To double check, this all only applies to "explore an indicator" pages, right? When the indicator + location isn't available at the current time selection, you jump back to the latest available time selection. Perhaps we should still put a rule in though, that you don't jump back by more than N days, and we show N/As everywhere if there is nothing available <= N days of the current time selection. Say N = 14 or N = 7.

Re your second point, let's try to discuss in the context of what's left (what isn't solved by the rule in your first point). By my count, what's left is:

  1. "explore an indicator" page where the location is a nation or a state, and the signal values showed within are from different locations (like counties within the state or nation) which may have different availabilities
  2. "explore a location" page where the signals are mixed and hence availabilities are mixed

For number 2, we're already all good with he behavior in the time series plots (i.e., the charts and the indicator table with sparklines at the bottom). So it's only the map and the signal numbers at the top. For the map, on "explore a location" page, we currently show only COVID cases and being all from JHU-CSSE it shouldn't have varying availability between nation/states/counties (unless something changes) so let's leave it for now. For the signal numbers at the top, here's what I was thinking:

If one of the signals (should typically be hospitalizations since that's lagging by at least 1 if not more days, depending on the geographic resolution) isn't available at the current selection, but is available within N days, then we show it's latest value with a clear asterisk and a little footnote:

IMG_0104

If not available within N days (here N = 7) then we show N/A and explain why:

IMG_0104 2


For number 1 in my list of "what's left" above, we need to figure out what to do with the map, and the table at the bottom with sparklines. For the table at the bottom, we could use the same thing we do in the indicators table at the bottom of "explore a location" pages, where we just snap to the most recent available numbers for each row (within the range of the sparkline) and the sparkline shows the corresponding date. we could do something similar.

For the map, we could do something similar to my above suggestion with asterisks. For all signal data that goes into the current map view, pick the most recent date at which the most granular signal data (counties in state view, states for beehive in nation view, counties for choropleth in nation view) is available, put a clear asterisk and a footnote if this forces us to roll back from the current date selection, or show N/As and explain why if nothing's available within N days.


What do you think?

@sgratzl Thanks for the handy work a while back, is this fixed by #1187?

Yep!