airr-community/common-repo-wg

ADC API not and is operators

Closed this issue · 3 comments

@schristley are the not and is operators as documented here:

https://github.com/airr-community/airr-standards/blob/metadata-docs/docs/api/overview.rst#request-parameters

from the GDC API? I find when you write a query using them it is very cumbersome. If this is GDC based, do they explain why they do it this way.

The following looks for things that have a clone ID and a specific V Gene, and then retrieves the junction for each clone:

{
    "filters": {"op":"and", "content": [
        {"op":"not", "content", {"field":"clone_id"}},
        {"op":"contains","content": {"field":"v_call","value":"IGHV3-30"}}
    ]},
    "fields":["clone_id","junction_aa"]
} 

According to the docs the clause

{"op":"not", "content", {"field":"clone_id"}}

returns true if clone_id is "not missing" which is what I want. For a given rearrangement, this will return values if the rearrangement is representing a clone with a specific v_call.

This would be much more clear if this was something like:

{"op":"notmissing", "content", {"field":"clone_id"}}
{"op":"exists", "content", {"field":"clone_id"}}

Maybe having a "exists"/"missing" or "notmissing"/"missing" pair of ops might be more clear?

from the GDC API?

Yes, I copied them over exactly. No, they don't really explain them and they didn't provide an example.

I remember thinking about the Mongo implementation as the $exists operator has the behavior that "matches documents that contain the field, including documents where the field value is null." So it's an odd operator in that is acts on the field itself, not on its value.

It isn't clear to me what your issue is. Are you getting the wrong data? I don't understand the difference between your suggested operators and what they mean.

I don't think any of the VDJServer data has clone_id, plus the data I loaded is from an old IgBlast (1.9 I think) so there are flaws in it.

No, the data is fine - in your case there isn't any clone data, and I figured that out 8-). I was trying to "implement" a query that ImmuneDB uses as an example, to demonstrate how the AIRR API could be used on ImmuneDB. The query needs to check to see if a rearrangement is a clone, and the way I did that was by checking to see if the rearrangement had a clone_id.

The problem is that the following clause:

{"op":"not", "content", {"field":"clone_id"}}

means

clone_id is not missing

From a clarity and understanding point of view, I find this very confusing.

  1. It uses "not" as an operator in a way that is not typical from a boolean/query perspective. That alone is I think an issue.
  2. The semantics of "not(clone_id)" being the equivalent of "clone_id is not missing" I find quite confusing. Given that not isn't a unary boolean operator, I would have thought "not(clone_id)" would be equivalent to "clone_id IS missing", exactly the opposite of the meaning that the operator has.

I agree it's not the best semantics, as we went with GDC compatibility, but UIs can hide this from users and present it in a more palatable manner, and we have documentation that describes the operators for developers.