decentralized-identity/presentation-exchange

Clarify filtering examples

jmandel opened this issue · 5 comments

Section 5 introduces filtering examples.

The "Filter by Credential Type" eexample includes:

"path": [
                "$.type"
              ],
              "filter": {
                "type": "string",
                "pattern": "<the type of VC e.g. degree certificate>"
              }

Typically a verifiable credential will have an array of types (starting with the base type, https://www.w3.org/2018/credentials#VerifiableCredential). But this filter is expecting the type property to evaluate to a single string, if i am reading the processing algorithm correctly. Does the example need to be rewritten as a filter with "type": "array", "contains": { ... to handle the typical case?

More generally... many fields in the VC data model can be represented as single values or arrays. To make filter logic that is robust, I suppose it becomes necessary to write filters that can match single values or arrays.

The filter object is quite literally any valid JSON Schema object, so you should be able to write a JSON Schema definition that tests values just about any way one can imagine, I'd think. In JSON Schema you can test arrays to make sure they contain certain elements, are of certain data types, match against a regexp, etc., so I believe your case is covered, but I'm curious what your exact test is, because I could probably write up a quick JSON Schema snippet to do it.

Here's a JSON Schema that tests all items in an array for adherence to a simple restriction that all members of the array must be strings that are either "foo" or "bar":

{
  "type": "array",
  "items": {
    "type": "string",
    "enum": [ "foo", "bar" ]
  }
}

Considering how expressive JSON Schema is, I can't imagine there's a check on a value that can't be accomplished with it.

Thanks for the quick response!

curious what your exact test is

Let's say I have a VC like Example 1 from the VC Data Model specification:

{
  
  "@context": [
    "https://www.w3.org/2018/credentials/v1", "https://www.w3.org/2018/credentials/examples/v1"
  ],
  "id": "http://example.edu/credentials/1872",
  "type": ["VerifiableCredential", "AlumniCredential"],

Developers looking at the DIF "filter by type" example would try to write an expression like:

"path": [
                "$.type"
              ],
              "filter": {
                "type": "string",
                "pattern": "AlumniCredential"
              }

But they will be surprised and disappointed when the expression fails to match real VCs (i.e., the filter assumes a scalar type when in all cases I am aware of, the type will be an array). So the fix here is to make sure the DIF "filter by type" example expects arrays. (It still isn't quite right because it implicitly depends on a certain @context being in place, but that feels like a harder issue to fix, and points to an impedance mismatch between this filtering approach and the use of JSON-LD -- but that deserves a separate issue.)

The slightly broader question is: Can we provide examples in the specification that are robust to the very common scenario where you don't know ahead of time whether you're going to get a scalar or an array. Because that complexity seems to come with the territory of the VC data model. In particular, the "Two Filters" example that currently looks for terms of use based on two properties could use this kind of treatment, because terms of use can be either a scalar or an array and the current example will only find values that happen to be scalars.

(I understand the benefit of creating very simple examples to help developers figure out what's going on, but misleading examples can cause enduring confusion.)

Yeah, I guess we put that simple example in to not overload people, but you're right that a more precise example of the most common case is probably needed. Would you be interested in doing a PR to modify one to be more representative of the case you outlined?

This issue addressed 2 ways:

  • #423 to incude new test vectors without overwriting
  • #424 to address rework of examples (realism, etc)