Documentation of function parameters is inconsistent and sometimes misleading for new users
horsburgh opened this issue · 3 comments
Using dataretrieval 1.0.6 in my class this week. Students noticed the following in PyCharm:
I told them they could pass a single site as a string or a list of sites as a list of strings, but they were confused because the highlighted message says "Expected type 'array.pyi' got 'str' instead".
The documentation of the get_discharge_peaks function says "array of strings":
I even had one student who converted his site code to a Numpy array and passed it to the get_discharge_peaks function. The function actually returns a response, but it's not just for the one site - it had data for lots of sites. Didn't dig in too deep there.
Documentation of the sites parameter for other functions is ambiguous, but not as confusing. In PyCharm:
In the code:
@thodson-usgs - we could take a crack at cleaning this up unless this is intentional or there's another reason not to?
@horsburgh, no that seems odd and the more I look through the code there are several things I'm wondering about.
I have a few suggestions:
- if possible, start with a smaller PR implementing this for one module. Let me review that and make suggestions before proceeding to others.
- sites should be a list or list-like (consider adding type checking to enforce that (as well as asserting pandas.api.types.is_list_like. Refer to that doc about which list-like classes should be covered in the type check)
- It's not totally clear why functions were broken apart like get_iv and _iv, etc? A separate PR could put those back together.
- Encourage students to start with a unit test, then submit once that test passes.
- Just in general, smaller more focused PRs make it much easier for me to review things and keep the code base organized.
@thodson-usgs - should the sites argument in the dataretreival functions actually be anything that is_list_like
according to the pandas specification, or should it just allow a single string or a list of strings? The function works correctly if I pass a single string or a list of strings, but if I pass a numpy array (for example) containing a single string, the function runs but doesn't return the right result. I don't think there's any code to read or convert other list-like structures (array, set, series, etc.) to what is needed for the function(s) to run correctly.
Pabitra just submitted a pull request with an initial implementation of type hints for just the one function so you can have a look and see if you like that. If you are OK with what he's done, we could work on the other functions.
@horsburgh, his PR looks great.
This is off the cuff, but I suspect it's as easy as
if type(x) is string:
continue
elif is_list_like(x):
x = list(x)
else:
raise