DOI-USGS/dataretrieval-python

Documentation of function parameters is inconsistent and sometimes misleading for new users

horsburgh opened this issue · 3 comments

Using dataretrieval 1.0.6 in my class this week. Students noticed the following in PyCharm:

image

I told them they could pass a single site as a string or a list of sites as a list of strings, but they were confused because the highlighted message says "Expected type 'array.pyi' got 'str' instead".

The documentation of the get_discharge_peaks function says "array of strings":

image

I even had one student who converted his site code to a Numpy array and passed it to the get_discharge_peaks function. The function actually returns a response, but it's not just for the one site - it had data for lots of sites. Didn't dig in too deep there.

Documentation of the sites parameter for other functions is ambiguous, but not as confusing. In PyCharm:

image

In the code:

image

@thodson-usgs - we could take a crack at cleaning this up unless this is intentional or there's another reason not to?

@horsburgh, no that seems odd and the more I look through the code there are several things I'm wondering about.
I have a few suggestions:

  1. if possible, start with a smaller PR implementing this for one module. Let me review that and make suggestions before proceeding to others.
  2. sites should be a list or list-like (consider adding type checking to enforce that (as well as asserting pandas.api.types.is_list_like. Refer to that doc about which list-like classes should be covered in the type check)
  3. It's not totally clear why functions were broken apart like get_iv and _iv, etc? A separate PR could put those back together.
  4. Encourage students to start with a unit test, then submit once that test passes.
  5. Just in general, smaller more focused PRs make it much easier for me to review things and keep the code base organized.

@thodson-usgs - should the sites argument in the dataretreival functions actually be anything that is_list_like according to the pandas specification, or should it just allow a single string or a list of strings? The function works correctly if I pass a single string or a list of strings, but if I pass a numpy array (for example) containing a single string, the function runs but doesn't return the right result. I don't think there's any code to read or convert other list-like structures (array, set, series, etc.) to what is needed for the function(s) to run correctly.

Pabitra just submitted a pull request with an initial implementation of type hints for just the one function so you can have a look and see if you like that. If you are OK with what he's done, we could work on the other functions.

@horsburgh, his PR looks great.

This is off the cuff, but I suspect it's as easy as

if type(x) is string:
    continue
elif is_list_like(x):
     x = list(x)
else:
     raise