anvilresearch/connect

Filter returned users from anvil.users.list()

saikojosh opened this issue · 4 comments

We need the ability to filter anvil.users.list() by the document properties. At the moment there's no way to specify options when querying the User collection: /routes/rest/v1/users.js:35.

Specifically we want to be able to:

  • Filter by User ID.
  • (Potentially) filter by other properties on the document.
  • Select only certain fields to be returned.

This will help us reduce overhead because we won't need to query the entire user collection just to pull out a handful of documents.

To expand on Josh's comments:

anvil.users.list() is a function in connect-nodejs, to allow options to be passed in to the REST API we would need to 1) define a consistent way of passing options 2) define which options are allowed 3) figure out how to best do the filtering

  1. For filtering results, URL query parameters should do the job well

  2. My thoughts on filtering options, which for the moment I'll only comment on for the route GET /v1/users:

  • Being able to filter by a range of user IDs would be very useful
  • Specifying the max number of documents to be returned - at the moment there is a default 'size' set in modinha-redis: https://github.com/anvilresearch/modinha-redis/blob/master/lib/RedisDocument.js#L41 and no option to change this.
  • Filtering by specific document fields might be more complex and is something the client could do, so unless there's an all-encompassing solution this could be split into a separate ticket
  1. Passing query parameters directly through as the options in User.list() could be a quick way to accomplish specifying no. of documents, would it also work for specifying a range?

I've got code working to specify the no of documents here:

https://github.com/hedleysmith/connect/commits/user-endpoint-options
https://github.com/hedleysmith/connect-nodejs/tree/user-endpoint-options

Could add PRs if this seems like a sensible way to go, or could figure out more on the PRs first?

Let me give a little background on this subject while we're thinking it through.

The idea when we started building Anvil Connect was that user data would be limited in scope to identity-related attributes and leave domain-specific profile data and ad hoc querying to a separate microservice. There are a number of reasons this seemed like the way to go at the time.

Given that constraint, there would be no need for the general queryability offered by SQL or Mongo-style backends. We wanted to keep this light, fast, close to the metal. We expected if we needed more sophisticated indexing we're use something like ElasticSearch. The indexing currently done in Redis is limited to the simple lookups needed by program logic, and unfortunately there isn't a great way to scan across the records in a map-reduce kind of way.

After getting some real world experience and feedback from other users, we're changing our thinking on this. For many Anvil Connect users, there are domain-specific user attributes that could end up playing a role in access control (apologies for the pun, I couldn't resist) and with a very large number user accounts, "search results" can be useful. If we're going to allow for extensible user schemas, it only makes sense to have more flexible querying.

These things are being taken into careful consideration for the next generation of Anvil Connect. In the mean time...

To be pedantic, it's not quite possible to filter on a range of user IDs, because being UUIDs they are not sequential and at any rate I'm not sure Redis hashes are ordered by field. Requesting multiple users if you already have all the IDs is feasible. IIRC, User.get() can already take an array of IDs at an argument. Under the hood that method uses the Redis HMGET command.

Selecting specific fields only is certainly possible as well. There's a select option that can be passed to User.list() or User.get.

The only thing stopping us from a "multiget" of users, attribute selection, paging, and controlling the size of the result set via the API is mapping request params to options in the User method calls.

@hedleysmith It's imperative that we restrict which params can be used for options for security purposes. There are a few options that are really intended for internal use only.

Glad to pair on this with anyone that wants to put some effort into fleshing out this part of the API, and look forward to reviewing PRs. Thanks in advance.

Hey Cristian,

Thanks for all the background info on this, very useful to understand. Also good to hear about the plans for the next iteration of Anvil Connect relating to these ideas.

Seems like this is all technically possible and that the underlying functionality in Redis / Modinha will already support what we want to achieve.

Yes lets set up a time to talk this through, I'll ping you on Gitter to figure out when would be a good time.

Just to recap, mainly for my own benefit, I think the main requirements and questions are:

  • Specify multiple users to return, e.g GET /users/:id,:id,:id...
  • Size of returned user list, e.g GET /users?size=500
  • Select specific fields only, e.g GET /users?fields=telephone,givenname,familyname... (note: would using custom scopes instead of be a more sane approach to this, or could both be useful?)
  1. Confirm the best API design for these updates (this could be useful for the next iteration of Anvil Connect as well?)
  2. Figure out how we can safely allow these request parameters to be passed through (I'm guessing storing a list of acceptable parameters and ignoring everything else would make sense, and making sure we also protect for XSS attacks)

👍