dracor-org/dracor-api

Issues with OpenAPI specification

Closed this issue ยท 12 comments

Hi,

there are some issues with the OpenAPI specification in https://github.com/dracor-org/dracor-api/blob/v1/api.yaml.

Context

I am sitting in the DraCor Onboarding Workshop and trying to break your API, i.e. testing the API schema, with permission of @lehkost. I am not an attacker. :)

I am using the Python library schemathesis. Another popular framework would be dredd.

$ pip install schemathesis
$ schemathesis run https://staging.dracor.org/api/openapi.yaml

Storing and replaying test cases
https://schemathesis.readthedocs.io/en/stable/cli.html#storing-and-replaying-test-cases

$ schemathesis run --cassette-path cassette.yaml https://staging.dracor.org/api/openapi.yaml

Findings

/id/{id}

There is no response 200 defined, only 303 and 404

dracor-api/api.yaml

Lines 1008 to 1009 in 82dd3d4

responses:
'303':

400

Notes

https://schemathesis.readthedocs.io/en/stable/how.html#payload-serialization
https://schemathesis.readthedocs.io/en/stable/cli.html#storing-and-replaying-test-cases

The id endpoint only issues a redirect depending on the accept header. 303 is the right status code. See specifications on "cool uris". It is also explained in our report On Programmable Corpora https://zenodo.org/record/7664964 p33

Hi, I am testing the API during the DraCor workshop and will post a few more examples. Alright?

Please feel free to ignore. :)

The id endpoint only issues a redirect depending on the accept header. 303 is the right status code. See specifications on "cool uris". It is also explained in our report On Programmable Corpora https://zenodo.org/record/7664964 p33

Can you give me a curl example with the correct HTTP header please?

The id endpoint only issues a redirect depending on the accept header. 303 is the right status code. See specifications on "cool uris". It is also explained in our report On Programmable Corpora https://zenodo.org/record/7664964 p33

Can you give me a curl example with the correct HTTP header please?

As stated in the /id/{id} endpoint description, you can specifically request either the JSON or RDF representation by sending the Accept header with the respective type. Any request with another Accept header or none at all will be redirected to web page of the play. See:

curl -i -H 'Accept: application/json'  https://staging.dracor.org/id/ger000001
curl -i -H 'Accept: application/rdf+xml'  https://staging.dracor.org/id/ger000001
curl -i  https://staging.dracor.org/id/ger000001

Do we need to make that more clear?

@cmil I saw, that you already submitted a fix.

I am unsure how to proceed. Should I

  • edit my original issue everytime with new findings?
  • open a new issue per finding? Might be overkill
  • add a checklist of some sort, so that fixed findings can be seen more easily?

Something like this?

404 Responses

  • 404 not defined: /api/wikidata/author/{id}
  • 404 not defined: /api/corpora/{corpusname}/play/{playname}
  • 404 not defined: /api/corpora/{corpusname}/play/{playname}/metrics
  • 404 not defined: /api/corpora/{corpusname}/play/{playname}/tei
  • 404 not defined: /api/corpora/{corpusname}/play/{playname}/rdf
  • 404 not defined: /api/corpora/{corpusname}/play/{playname}/cast
  • 404 not defined: /api/corpora/{corpusname}/play/{playname}/cast/csv
  • 404 not defined: /api/corpora/{corpusname}/play/{playname}/networkdata/csv
  • 404 not defined: /api/corpora/{corpusname}/play/{playname}/networkdata/gexf
  • 404 not defined: /api/corpora/{corpusname}/play/{playname}/networkdata/graphml
  • 404 not defined: /api/corpora/{corpusname}/play/{playname}/spoken-text
  • 404 not defined: /api/corpora/{corpusname}/play/{playname}/stage-directions
  • 404 not defined: /api/corpora/{corpusname}/play/{playname}/stage-directions-with-speakers

Is there a DEBUG setting enabled in your server?

Or could you serve a custom error page for HTTP 500 errors?

I am not familiar with jetty, but this might be it? The ErrorPageErrorHandler?
https://eclipse.dev/jetty/javadoc/jetty-9/org/eclipse/jetty/servlet/ErrorPageErrorHandler.html

An internal server error should probably not expose the stack trace?

Example: https://staging.dracor.org/api/corpora/0/play/%3A/metrics

401

The endpoints tagged as admin can return 401, which is not defined in the schema. Maybe under components you could add a 401 response component and reuse it with $ref: "#/components/responses

components
[...]
  responses:
    '401':
      description: Authorization information is missing or invalid.

With the contents {"message" : "authorization required"}

But I am not sure if this is necessary.

Refs: https://swagger.io/docs/specification/describing-responses/

I am unsure how to proceed. Should I

  • edit my original issue everytime with new findings?
  • open a new issue per finding? Might be overkill
  • add a checklist of some sort, so that fixed findings can be seen more easily?

@afuetterer Thanks a lot for your debugging efforts! It helps greatly to improve the consistency and stability of the DraCor API. Could I still ask you to create individual issues for each new finding, i.e. for things like the ones you listed under its own subheading in the original post (but not for each individual endpoint that has the same problem as in #205 (comment))? Separate issues make it much easier to discuss and solve problems step by step and track the progress. Collecting several problems into one issue, even if they seem to be related, can make the discussion unnecessarily difficult and longwinded and lead to the issue staying unresolved for a long time.

For the current issue, you can leave it as is, but please don't add new findings. I will come back to individual comments as I find time.

A new issue would then be for an "atomic" problem, right?

endpoint + problem + steps to reproduce

e.g.

/api/corpora/{corpusname}/play/{playname} - 404 not defined - https://staging.dracor.org/api/corpora/0/play/0

Yes, atomicity is a good criterion. Something that could be taken care of in an individual step, possibly by a single person should be a separate issue. The 404 problem would be such a case, although that should be a single issue for all affected endpoints, not one issue per endpoint.

Closing this. Feel free to reopen.