Add basic structure for API client class

Question

Add basic structure for API client class

JWCook opened this issue 4 years ago · 15 comments

The data models (#145) are mostly complete. The next step is to add support for API requests that return model objects instead of JSON. For backwards-compatibility, this should be an optional feature, not a replacement for the existing API functions.

Update: Decided to go with a compromise between options 2 and 3, described in comments below. This will involve an API client class and one controller class per resource type.

Options

There are at least two ways to go about this:

Option 1: Add a keyword arg to existing functions to specify return format

Example:

observations = get_observations(user_id='my_username', format='model')

Pros:

This could work nicely with other possible return formats, if we add optional integration with pyinaturalist-convert later on. format could be used to specify any other supported format, for example format='csv'.
No new functions for users to learn

Cons:

Adding multiple return types for the same function generally isn't good practice
- For other readers of client code that uses these functions, it can take a bit longer to understand what's going on
- A long Union[x, y, z] isn't very useful as a return type annotation

Option 2: Add wrapper @classmethods to model objects

Example:

observations = Observation.search(user_id='my_username')

Pros:

Results in more readable client code
Makes a more obvious separation between the "low-level" API (raw JSON) and "high-level" API (model objects)

Cons:

Mixes data with logic
- It can be nice to keep those separate (as in MVC patterns), but not totally necessary
Requires addling slightly more code to implement
- This can be mitigated somewhat by using forge to reuse existing function signatures & docstrings

Examples in other API clients:

Service resources in boto3, and other service-specific interfaces like S3.Bucket

Answer 1 · 2021-06-03T00:01:45.000Z

Okay, after writing that out, I may be leaning toward option 2.

Do you have any thoughts on this, @niconoe?

Answer 2 · 2021-06-03T00:04:30.000Z

Also @synrg, since you expressed interest in this, do you have a preference between these two options?

Answer 3 · 2021-06-03T08:46:34.000Z

@JWCook: thanks!

While option 1 is also perfectIy fine, I really like option 2. It's like a brand new and exciting API (between client code and Pyinaturalist).

In that case, I'd also suggest communicate to users in terms that keep the separation very clear and also give guidance: "hey, we have a new major version of Pyinaturalist, you can continue using the old (mostly unchanged API) but there's a new one that's probably nicer and recommended for new projects, here is the tutorials/cheat sheet.

I'm not too concerned about the "mixes data with logic" part, in think it's inevitable to some extend, especially for a data-oriented project like this one. But maybe I'm not understanding the issue fully, do you have a specific example that bothers you and that you want to discuss?

Just my 2 cents, I think in the end you're in the best position to make the final call! Also curious to hear @synrg, it's amazing to have more people contributing (even if just suggestions, we're better together!)

Answer 4 · 2021-06-03T16:18:32.000Z

I'm also in favour of option 2. Pragmatism wins out here for me over strict separation of code & logic. And if not 2, then something else other than 1, which is not very appealing for the reasons already stated.

Regarding the example given, why Observation.search() and not Observations.search() (as on a collection of Observation objects)?

Answer 5 · 2021-06-03T16:35:02.000Z

Thanks for the input! My example was for adding that to the current Observation model, just because that's what I have so far.

Adding separate classes for model collections is a possibility. I've actually experimented with that a bit over here, but I wasn't quite happy with how that was turning out. It seemed a bit confusing to work with a List[Observation] in some places and an Observations object in others. I also couldn't find any good examples of something similar in other API clients out there. Not to say it's not doable, it would just take some more thought.

Answer 6 · 2021-06-03T16:38:20.000Z

I'm all for keeping things simpler. It wasn't something so much that I thought was wrong as something I didn't have a clear idea yet of how it ought to work. Looking for examples from other API clients as a guide would be where I'd start, too.

Answer 7 · 2021-06-03T16:57:05.000Z

Option 3: API client class

Another slightly different option to consider would be a single API client class containing wrappers for all API functions. There are some good examples of that out there, like boto3 again (with Sessions), the other AWS SDKs, and python-gitlab.

Usage could look something like:

client = InatClient()  # Probably needs a better name
observations = client.get_observations(user_id='my_username')

Pros:

A client object could store additional state, like requests.Session settings, rate-limit settings, cache settings (for #158), etc.
Requires fewer imports
Instance methods may be more intuitive than class methods for some users
???

Cons:

At first glance, usage maybe doesn't look quite as nice as <ModelName>.search()
A lot of methods for one class, making docs a little harder to navigate compared to docs for a single resource type
???

I need to think through that one a bit more. Any thoughts on that?

Answer 8 · 2021-06-03T17:08:06.000Z

That's exactly the choice I made in Dronefly. I wouldn't hold up my newbie python code as a shining example to follow, though. :) And yeah, it's ... messy with all those methods and no grouping together per resource type, I agree.

Answer 9 · 2021-06-03T17:16:23.000Z

A hybrid approach? Client object for all the settings and stuff, but with each of the models represented thusly (from Gitlab link you gave):

# list all the projects
projects = gl.projects.list()
for project in projects:
    print(project)

# get the group with id == 2
group = gl.groups.get(2)
for project in group.projects.list():
    print(project)

# create a new user
user_data = {'email': 'jen@foo.com', 'username': 'jen', 'name': 'Jen'}
user = gl.users.create(user_data)
print(user)

We could have inat.observations.search(), etc.

Answer 10 · 2021-06-03T19:45:58.000Z

Yeah, I like the look of that!

Answer 11 · 2021-06-04T14:36:09.000Z

I'm not sure I fully understand the last example, but the settings discussion made me think of the requests approach which I quite like:

you can either call methods (such as get()) directly (nice for newcomers), or via a method of a session object (allows advanced settings, sharing those settings across subsequent calls, ...).

Answer 12 · 2021-06-04T15:43:45.000Z

As for how I understood the last example, inat would be an instance of INatClient (or whatever) containing settings, session, and all the other goodies. inat.observations would be how you organize all the model methods off of the client, i.e. inat.observations.search() would do an Observation.search() within the context of the inat client (settings, session, etc.), I think.

This resolves my earlier discomfort with Observation.search(), i.e. I want to "search the observations" (plural), not "search the observation". inat.observations.search() is a very natural way to express that, and also resolves the objection Jordan raised that putting everything on the client object would add way too many methods to it and not organize them in any fashion.

I guess it's kind of like requests, only different, because we have all of these different kinds of resources to contend with, unlike requests, which is much simpler at the top level. So we share more in common with Gitlab. Unsurprising, since both are APIs for websites with numerous interrelated resources on them.

Answer 13 · 2021-06-04T16:01:46.000Z

Exactly, I see that as a good way to get the benefits of both options 2 and 3 without most of the drawbacks we mentioned. So InatClient.observations would be a "controller" object that contains the wrapper methods for observation requests, and InatClient.taxa for taxon requests, etc., rather than putting all the wrapper methods in a single class.

The only downside is, well, it will be more work, but I'm up for it. I should have time within the next week or so to get a start on this.

Answer 14 · 2021-06-28T15:04:54.000Z

Based on a suggestion from Ben, it would be useful to have some default request params that could be set on the client and then passed to any applicable API requests. Possible params include:

preferred_place_id
locale
per_page

Answer 15 · 2021-07-30T21:44:06.000Z

Continuing this in #217