Consider making the functions called by `get_record()` private
mnfienen opened this issue · 2 comments
As a new user, in a jupyter notebook, I used the tab completion on dataretrieval
as imported and found the many get_...
functions including get_pmcodes
. But then I was surprised that it returned a tuple including metadata.
Looking in the code, I see that get_record
is a light wrapper around all these that returns only the df
. All good, but it might be nice to push users toward get_record
and make all the underlying functions private. Just a refactor from get_pmcodes
to _get_pmcodes
etc. I'm happy to refactoir, but wanted to see if this is something intentional for reasons I'm not seeing.
Alternatively, could there be a default in all the get
functions to suppress returning the metadata unless requested? This would make it easier to know which **kwargs
the underlying function needs.
I originally used get_record
, but other contributors convinced me to deprecate it (it retains its legacy behavior)
Metadata was a later addition. Nobody every liked how it was implemented, but this was fundamentally a limitation of pandas
and its lack of a metadata standard.
Ideally, we'd put the metadata in pandas.DataFrame.attrs
but pandas
has flagged attrs
as experimental and may change without warning...
My plan was to continue to return a tuple until pandas
improves its metadata or xarray
natively handles ragged arrays.
But we've waited several years already, so it's good to revisit this.
You might also prefer HyRiver, which is another great collection of packages. I frequently use both. dataretrieval
has a much simpler creedo, which is to do one thing well.
Right on - thanks for the info.
I'm pretty stoked on this project being simple and being supported by USGS. There are other packages out there as well but they are just complicated and I like the idea of staying focused on core functionality.
So - using the various specific get
functions makes good sense too. Maybe the default idea that allows a request for the metadata but defaults to only returning the dataframe? I may be missing something, but seems like the metadata is more valuable for debugging than general use?