Add basic structure for API client class
JWCook opened this issue ยท 15 comments
The data models (#145) are mostly complete. The next step is to add support for API requests that return model objects instead of JSON. For backwards-compatibility, this should be an optional feature, not a replacement for the existing API functions.
Update: Decided to go with a compromise between options 2 and 3, described in comments below. This will involve an API client class and one controller class per resource type.
Options
There are at least two ways to go about this:
Option 1: Add a keyword arg to existing functions to specify return format
Example:
observations = get_observations(user_id='my_username', format='model')
Pros:
- This could work nicely with other possible return formats, if we add optional integration with pyinaturalist-convert later on.
format
could be used to specify any other supported format, for exampleformat='csv'
. - No new functions for users to learn
Cons:
- Adding multiple return types for the same function generally isn't good practice
- For other readers of client code that uses these functions, it can take a bit longer to understand what's going on
- A long
Union[x, y, z]
isn't very useful as a return type annotation
Option 2: Add wrapper @classmethods to model objects
Example:
observations = Observation.search(user_id='my_username')
Pros:
- Results in more readable client code
- Makes a more obvious separation between the "low-level" API (raw JSON) and "high-level" API (model objects)
Cons:
- Mixes data with logic
- It can be nice to keep those separate (as in MVC patterns), but not totally necessary
- Requires addling slightly more code to implement
- This can be mitigated somewhat by using
forge
to reuse existing function signatures & docstrings
- This can be mitigated somewhat by using
Examples in other API clients:
- Service resources in
boto3
, and other service-specific interfaces like S3.Bucket
Okay, after writing that out, I may be leaning toward option 2.
Do you have any thoughts on this, @niconoe?
Also @synrg, since you expressed interest in this, do you have a preference between these two options?
@JWCook: thanks!
While option 1 is also perfectIy fine, I really like option 2. It's like a brand new and exciting API (between client code and Pyinaturalist).
In that case, I'd also suggest communicate to users in terms that keep the separation very clear and also give guidance: "hey, we have a new major version of Pyinaturalist, you can continue using the old (mostly unchanged API) but there's a new one that's probably nicer and recommended for new projects, here is the tutorials/cheat sheet.
I'm not too concerned about the "mixes data with logic" part, in think it's inevitable to some extend, especially for a data-oriented project like this one. But maybe I'm not understanding the issue fully, do you have a specific example that bothers you and that you want to discuss?
Just my 2 cents, I think in the end you're in the best position to make the final call! Also curious to hear @synrg, it's amazing to have more people contributing (even if just suggestions, we're better together!)
I'm also in favour of option 2. Pragmatism wins out here for me over strict separation of code & logic. And if not 2, then something else other than 1, which is not very appealing for the reasons already stated.
Regarding the example given, why Observation.search()
and not Observations.search()
(as on a collection of Observation objects)?
Thanks for the input! My example was for adding that to the current Observation
model, just because that's what I have so far.
Adding separate classes for model collections is a possibility. I've actually experimented with that a bit over here, but I wasn't quite happy with how that was turning out. It seemed a bit confusing to work with a List[Observation]
in some places and an Observations
object in others. I also couldn't find any good examples of something similar in other API clients out there. Not to say it's not doable, it would just take some more thought.
I'm all for keeping things simpler. It wasn't something so much that I thought was wrong as something I didn't have a clear idea yet of how it ought to work. Looking for examples from other API clients as a guide would be where I'd start, too.
Option 3: API client class
Another slightly different option to consider would be a single API client class containing wrappers for all API functions. There are some good examples of that out there, like boto3
again (with Sessions), the other AWS SDKs, and python-gitlab.
Usage could look something like:
client = InatClient() # Probably needs a better name
observations = client.get_observations(user_id='my_username')
Pros:
- A client object could store additional state, like
requests.Session
settings, rate-limit settings, cache settings (for #158), etc. - Requires fewer imports
- Instance methods may be more intuitive than class methods for some users
- ???
Cons:
- At first glance, usage maybe doesn't look quite as nice as
<ModelName>.search()
- A lot of methods for one class, making docs a little harder to navigate compared to docs for a single resource type
- ???
I need to think through that one a bit more. Any thoughts on that?
That's exactly the choice I made in Dronefly. I wouldn't hold up my newbie python code as a shining example to follow, though. :) And yeah, it's ... messy with all those methods and no grouping together per resource type, I agree.
A hybrid approach? Client object for all the settings and stuff, but with each of the models represented thusly (from Gitlab link you gave):
# list all the projects
projects = gl.projects.list()
for project in projects:
print(project)
# get the group with id == 2
group = gl.groups.get(2)
for project in group.projects.list():
print(project)
# create a new user
user_data = {'email': 'jen@foo.com', 'username': 'jen', 'name': 'Jen'}
user = gl.users.create(user_data)
print(user)
We could have inat.observations.search()
, etc.
Yeah, I like the look of that!
I'm not sure I fully understand the last example, but the settings discussion made me think of the requests approach which I quite like:
you can either call methods (such as get()
) directly (nice for newcomers), or via a method of a session object (allows advanced settings, sharing those settings across subsequent calls, ...).
As for how I understood the last example, inat
would be an instance of INatClient (or whatever) containing settings, session, and all the other goodies. inat.observations
would be how you organize all the model methods off of the client, i.e. inat.observations.search()
would do an Observation.search()
within the context of the inat
client (settings, session, etc.), I think.
This resolves my earlier discomfort with Observation.search()
, i.e. I want to "search the observations" (plural), not "search the observation". inat.observations.search()
is a very natural way to express that, and also resolves the objection Jordan raised that putting everything on the client object would add way too many methods to it and not organize them in any fashion.
I guess it's kind of like requests, only different, because we have all of these different kinds of resources to contend with, unlike requests, which is much simpler at the top level. So we share more in common with Gitlab. Unsurprising, since both are APIs for websites with numerous interrelated resources on them.
Exactly, I see that as a good way to get the benefits of both options 2 and 3 without most of the drawbacks we mentioned. So InatClient.observations
would be a "controller" object that contains the wrapper methods for observation requests, and InatClient.taxa
for taxon requests, etc., rather than putting all the wrapper methods in a single class.
The only downside is, well, it will be more work, but I'm up for it. I should have time within the next week or so to get a start on this.
Based on a suggestion from Ben, it would be useful to have some default request params that could be set on the client and then passed to any applicable API requests. Possible params include:
- preferred_place_id
- locale
- per_page