distributed-text-services/specifications

DTS validation process

Opened this issue · 14 comments

What are the available options, if any, to automate the validation of DTS services (or sets of requests & responses)? Do you issue an official json schema, open api specification or any other tool that could be used to test that a service complies with the DTS spec?

Dearg @geoffroy-noel-ddh :)

There are different layers of answer to this question :)

The first one, the most direct, is we don't have currently this kind of tooling. Unfortunately, JSON Schema and OpenAPI do not agree very well (AFAIK) with JSON-LD, which has made the development of such tools a bit complex.

The second one is we are currently (well, we are on vacation but normally), we are doing #210 which aims at simplifying the writing for non JSON-LD users. And it would make testing easier for sure.

The last one is I'd recommend to focus on content right now and 'mechanics' rather than the expression of the JSON, as we are getting ready to release a draft TWO which is gonna be much cleaner but also includes some scheme changes. See for example #208

I assume we'll resume operation shortly, if you want to give us any feedback on the documentation feel free :)

Thanks for the prompt & clear response!

As of 28-10-2022, we discussed that in our Technical Committee meeting.

@monotasker is gonna try to propose a draft for this kind of functionality for our next meeting (December 2nd).

We aim to try SHACL (while not saying this is what we are gonna use). Probably in a scripted manner...

Three level of validations would be nice:

  1. Validation of a JSON-LD object
  2. Validation of a HTTP Response
  3. Validation of an API.

It would be nice to check that the Document reply contains the correct properties on dts:fragment if it's used.

Or as an alternative to SHACL: (1) parse the JSON with context/namespace aware JSON, (2) validate it against JSON Schema.

Hello 👋
During the DTS Hackathon last year there was the idea of hack up a DTS validator but it never took off.
Some initial thoughts about functionalities/implementation are jotted down here, in case it's of any help.

Thanks @mromanello. I'll be working on this in the next month, so that's a big help.

@monotasker That's great to hear you'll be working on it soon! Do not hesitate to reach out for feedback and if you need alpha testers ;)

Okay, in order to build a validation tool, I had to start with at least one JSON schema to use for the validation. Here is a first draft of a schema for the response object from the Collection endpoing:

{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "http://yourdomain.com/schemas/collection_response_schema.json",
"type": "object",
"properties": {
    "title": {"type": "string"},
    "@id": {
        "type": "string",
        "format": "uri"
    },
    "@type": {
        "type": "string",
        "pattern": "^(Collection|Resource)$"
    },
    "totalItems": {"type": "number"},
    "totalChildren": {"type": "number"},
    "totalParents": {"type": "number"},
    "maxCiteDepth": {"type": "number"},
    "description": {"type": "string"},
    "member": {
        "type": "array",
        "items": {
            "$ref": "#"
        }
    },
    "dublincore": {
        "type": "object"
    },
    "extensions": {
        "type": "object"
    },
    "references": {
        "type": "array",
        "items": {
            "type": "string",
            "format": "uri"
        }
    },
    "passage": {
        "type": "string",
        "format": "uri"
    },
    "download": {
        "anyOf": [
            {"type": "string",
             "format": "uri"},
            {"type": "array",
             "items": {
                "type": "string",
                "format": "uri"
             }
            }
        ]
    },
    "citeStructure": {
        "type": "array",
        "items": {
            "type": "object",
            "properties": {
                "citeType": { "type": "string" },
                "citeStructure": {
                    "$ref": "#/properties/citeStructure"
                }
            }
        }
    }
},
"required": [
    "title", "@id", "@type", "totalItems", "totalChildren", "totalParents"
]
}

I'm focusing first on validating response objects. Once that's done I'll look at validating the rest of the HTTP response and validating the endpoint structure for the API implementation.

At the moment I'm looking for a way to validate returned objects from an endpoint directly with a schema like the one above, without the complication of converting the JSON-LD to RDF. I'm still figuring out how to handle the context. If anyone has a pointer in that regard that would speed up my figuring-out here, I'd welcome suggestions. I'm looking at this js library at the moment: https://github.com/mulesoft-labs/json-ld-schema

I think you might want to do the following:

  • Parse the JSON
  • Resolve the @context (using a Python JsonLD library that only deals with prefixes and does not expand anything else ?)
  • Validate against a schema that would contain the fully resolved properties

OR

  • Parse the JSON
  • Apply our @context
  • Validate against the schema.

Option 1 focuses on JsonLD conformance, option 2 allows for having a schema which is readable by normal developers.

Note:

  • I think you can't self reference Collection|Resource with # in the members because the required properties are not the same
  • I think you should required citeType in citeStructure
  • I think it would be great to valide dublincore object against a specific structure... Which is gonna be a lot of work.

And a last comment to say it looks great :) And I think it's gonna be a very useful tool.

While working on the DTS validator, I realised that it's hard to test for e.g. the presence of URI template parameters that are mandatory (or not) depending on the compliance level (see issue #233), if the compliance level of an API implementation is not declared somewhere.
Shouldn't it be declared in the Entry endpoint response, similarly to dts-version?