opencontainers/distribution-spec

Proposal: refactor conformance tests

rchincha opened this issue · 14 comments

The feedback about conformance tests is that they have been useful but UX in general is terrible.

Here are some UX requirements from the last OCI meeting (08/01/2024)

[ ] Root-cause-analysis of test failures should be easier/clearer

[ ] Conformance tests should be configuration (config-file) driven so that

  • the MAY, MUST, SHOULD, etc can be set, and individual tests and suites can be turned on or off
  • the config-file can be shared so that it can be used to verify and derive
  • OCI specs can publish a recommended config-file on GA

[ ] Clients expectations (of registries) from conformance tests

[ ] Registries' expectations (of clients) from conformance tests

[ ] A website revamp for the results/config matrix

I've volunteered to work on the registry server tests when time permits (additional help is always appreciated). Someone else would need to write the tests for clients. LF/OCI may help with a resource to work on the website, but a volunteer to help with that would also be useful.

Some notes about gingko which is the testing framework we use in conformance tests. It is quite powerful and we may just be using a subset of its features.

https://onsi.github.io/ginkgo/#filtering-specs

https://onsi.github.io/ginkgo/#spec-labels

https://onsi.github.io/ginkgo/#table-specs

https://onsi.github.io/ginkgo/#controlling-ginkgos-output

My preference is to use the Go stdlib where possible. My goal is to minimize the number of external libraries and frameworks someone needs to learn to submit a PR to the project.

would like to volunteer

Oh, I also wasn't volunteering to redesign the configuration around a configuration file. My plan was to stick with environment variables, maybe CLI parameters, to simplify the integration with GitHub actions and similar CI tooling. Someone else would need to work on that task.

I'm worried this issue is very broad and unlikely to be fully resolved (who's writing the conformance tests for client tools?), particularly not by a single PR. Complex tasks that require a lot of work are being listed as a single bullet point with little detail of the requirements. This may be better of being closed and individual issues or PRs opened under the conformance label (assuming they don't already exist).

As an aside, I may try rewriting it as a standard Go binary instead of with the Go test interface it is using now. That would simplify building it and also allow the conformance test to be tested itself.

My preference is to use the Go stdlib where possible. My goal is to minimize the number of external libraries and frameworks someone needs to learn to submit a PR to the project.

I have the opposite concern that we may NIH this and end up with the same functionality.

Okay, I'll hold off on doing anything pending agreement on these concerns.

Some more notes ...

If test configuration is passed as env vars,
"pull.subtest1.arg2.subarg3=X"

(vs)

as a config file (which can be downloaded,shared,derived,and shared again)

{ "pull": {
"subtest1": {
"arg2": {
"subarg3": X
}
}
}

I have been looking at the tests and I have found following issues:

  1. Workflows enumeration items are being used as bitmasks. But also to generate configs and manifests loops using numWorkflows. And numWorkflows as the last enumeration item of a 5 bits long bitmask has the value 64! I don't think that 64 configs and manifests are needed for the tests. Somehow that enumeration is being misused.
  2. Either I am missing it, or there is no "Blob Upload Streamed" according the specification. But there is a test for it!! Is it a non-specified workflow that appears in some implementations?

In general setup.go:init is IMHO much too complex and convoluted, therefore it completely misses the alignment with the tests. So there is no clear use-case for everything is being specified there.

My preference is to use the Go stdlib where possible. My goal is to minimize the number of external libraries and frameworks someone needs to learn to submit a PR to the project.

Although I am not a friend of adding to many libraries and initially was wondering why not simply using the Go stdlib... I want to use the Junit reporting on GitLab (the system used on my company) and the result coming from ginkgo is much more readable than the one from the stdlib. So I see advantages on both approaches.

I have had a deeper look at the tests and I have done some experiments resulting in the PoC that you can find in this branch of my fork. It is a single commit b59cf09 only covering part of the tests, but hopefully enough to illustrate my point.

It basically demonstrates

  • how finer granularity can help having a more complete view of what fails and
  • how better verbosity can help better understanding what it fail.

Finer granularity

If you compare the report of the current tests (see report.main.github.html.zip for the whole HTML report) with those of my branch (see report.silvanoc.github.html.zip for the whole HTML report), you will quickly see that all the uploads of manifests with custom artifact types and subject are failing on GitHub.

These screenshots illustrate it:
Based on current main
image

Based on my branch
image

But having finer granularity alone is not enough...

Better verbosity

Looking a the above screenshot of my branch you see that all failing tests are reporting that The expected 'OCI-Subject' is missing in the headers. So PUTting the manifests succeed, it is only the response that is not providing the expected header!

If you compare it with the results I get running my branch on GitLab.com, you see that the PUT request has failed. That is because GitLab.com rejects as of now manifests with unknown artifact type (no matter if over artifactType or over config/mediaType).
image

So thanks to the added changes GitHub and GitLab can better understand what they still need to fix and clients can better understand what will work (occasionally being permissive on the clients ignoring the GitHub missing response header) and what not.


I am aware, that some of the issues becoming visible with my PoC are related to the image-spec and not to the distribution-spec, but...

  1. The distribution-conformance tests are also covering part of the image-spec the moment they start using custom artifact types (which are irrelevant for the distribution-spec).
  2. That is my main concern and I cannot find a better place to address it than in these tests.

I would much rather see a holistic approach over adding more patches onto the existing conformance tests. Adding tests for referrers already stretched the existing model beyond the initial design. Trying to add something like sha512 tests, where everything would need to be duplicated to ensure blobs, manifests, tags, and referrers, all work with the alternative digest, would be difficult and error-prone to maintain.

I would much rather see a holistic approach over adding more patches onto the existing conformance tests. Adding tests for referrers already stretched the existing model beyond the initial design. Trying to add something like sha512 tests, where everything would need to be duplicated to ensure blobs, manifests, tags, and referrers, all work with the alternative digest, would be difficult and error-prone to maintain.

I can understand your reasons. A holistic approach is probably needed...

But I have my reasons to share here my proposals on specific changes over the existing tests:

  1. I do not want to wait for something that I need and might not come in months. I am doing it anyway, so I can share it here for free. Since I know your preferences on the approach, I won't feel disappointed if you simply ignore them.
  2. My change proposals illustrate a couple of points that IMHO should be considered in the new holistic approach:
    1. Finer granularity on introduced changes: a manifest using a custom artifact type and subject might get rejected because one of them is not supported, but you won't know which one or if both fail.
    2. Finer granularity on testing and reporting in failures: If a test node has too many steps that might fail, then the report will be hard to read without getting the source code.
    3. Compact and user-oriented reporting on successes: push tests should not be reporting tests that are supposed to be working (like a GET after a successful PUT).
    4. More saying reports: automatic error reporting (expected 'x', got 'y') does not say much to a user. It does not help to know if the error has happened PUTting the manifest and not GETting it afterwards
    5. Others to come?