pacificclimate/pdp_util

util.get_stn_list() returns stations for which no observations exist

Closed this issue · 1 comments

Background

  • There are stations in the PCDS database for which no observations exist (e.g. ARDA/105046).
  • The pdp agg (aggregation) response, works by taking a number of "station filters" as HTTP parameters, translating that to a list of stations, and then issuing a request to pydap for each station.
  • Creating the station list gets (mostly) delegated to util.get_stn_list()

Problem

Stations without observations can possibly be included in the station list. If the user request includes no start_date filter, then stations without observations (which do not have a start date) are not filtered out.

Resoution?

Unclear. The min_obs_time and max_obs_time columns in meta_history have overloaded functions. They represent:

  • In the the database, a NULL max_obs_time means that the station is active
  • NULL or missing start or end dates in the HTTP params means, do not filter on the attribute
  • There is no database convention for specifying that this station has no data. Maybe a NULL min_obs_time? That does appear to be consistent in the data:
crmp=> select count(*), history_id  from crmp_network_geoserver natural join obs_raw where min_obs_time is null group by history_id; count | history_id 
-------+------------
(0 rows)

So perhaps we can just add a min_obs_time is not NULL filter, and that will clear it up.

Duplicate of #8