GClunies/py_noaa

Large requests with data gap between begin_date and end_date may throw errors

GClunies opened this issue · 1 comments

Background:
NOAA tides&currents limits data request sizes (either 31 days or 365 days) based on the product type and the interval the user has requested the data.

As a result, when a large request is made (months to years of data) using coops.get_data(), then coops.get_data() handles the request by looping, with each loop making a separate request for a "block" of data. As coops.get_data() loops through, each block's begin_date and end_date is adjusted accordingly.

Issue:
When making large requests (e.g. 16 years as in example below), if there happens to be a "break" in the data record which is longer than the "block" size in each loop... then a ValueError is thrown since at least one of the dates falls inside the break, making the request invalid.

Example of Isssue:
Example below throws an error. Requesting wind from 2010 to 2016 (missing data somewhere between 2010 and 2012). Right now this needs to be split into two separate requests:

In [1]: from py_noaa import coops

In [2]: df_winds_KIP = coops.get_data(
   ...: begin_date="20000101",
   ...: end_date="20160101",
   ...: stationid="8632200",
   ...: product="wind",
   ...: interval="h",
   ...: units="english")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-652ff01d4272> in <module>()
      5 product="wind",
      6 interval="h",
----> 7 units="english")

~\AppData\Local\Continuum\anaconda3\lib\site-packages\py_noaa\coops.py in get_data(begin_date, end_date, stationid, product, datum, bin_num, interval, units, time_zone)
    287                 stationid, product, datum, bin_num, interval, units, time_zone)
    288
--> 289             df_new = url2pandas(data_url, product)  # Get dataframe for block
    290             df = df.append(df_new)  # Append to existing dataframe
    291

~\AppData\Local\Continuum\anaconda3\lib\site-packages\py_noaa\coops.py in url2pandas(data_url, product)
    157     if 'error' in json_dict:
    158         raise ValueError(
--> 159             json_dict['error'].get('message', 'Error retrieving data'))
    160
    161     if product == 'predictions':

ValueError: No data was found. This product may not be offered at this station at the requested time.

Note: #29 was found to break the build (did not show up in Travis CI initially?) and therefore was reverted. This issue is still unsolved.