opencontainers/distribution-spec

[Maintainer vote] Is requiring registries to accept non-existent subjects a breaking change?

jdolitsky opened this issue · 25 comments

Note: Please do not comment on this issue. Please contain all discussion on this topic to #459.

This topic had been discussed in depth in #459, #483, in-person at KubeCon NA `23, and in various OCI meetings over the last few months. This issue is to get a final vote from the acting distribution-spec maintainers on the following question:

Is requiring registries to accept non-existent subjects a breaking change?

Maintainer Vote
@sudo-bmitch no
@dmcgowan yes
@jzelinskie no
@jonjohnsonjr no
@jdolitsky no
@mikebrow yes
@stevvooe yes
@vbatts no

Maintainers: please edit this issue description to add your vote. Please use one of the following answers:

  • "yes" (this is a breaking change)
  • "no" (this is not a breaking change)

Note: Please do not comment on this issue. Please contain all discussion on this topic to #459.

I stated before I don't want the political discussion there but to keep it about proposed solutions.

This does not need to be a winners and losers issues, every use case can be supported. I think requesting a vote pointing to a long and hard to follow discussion is not fair to the maintainers.

@dmcgowan The intention was to keep this issue as a completely non-biased vote amongst maintainers. I understand if we want to discuss someplace outside of #459, but I don't think the discussion belongs here.

I'm generally of the opinion that if we reference an object in another object, the object must exist in the registry. We've built the image spec and distribution implementations with this concept from the beginning. It prevents a lot of mistakes in pushing content that could otherwise be exposed to users.

The enforced order ("DAG order" we'll call it), would be the following for a push that follows this invariant:

  1. images and layers.
  2. image-index.
  3. signature.

It sounds like the benefit of relaxing this ordering is to allow one to push a signature or SBOM before the image-index is pushed, so that when one tries to pull the new image, the attachment is guaranteed to exist.

Is it worth relaxing the fundamental design principle of the registry to solve this particular problem? I think we need a better statement of impact before we drop a design principle that has gotten us this far.

I'm voting "Yes", but with the statement that we need to make it a MUST to validate the presence of targeted subject. I am not aware of how this will cause problems with existing implementations on the client side, but understand if they need to relax adherence. If there is a clear use case that can't be done with out relaxing this validation, I'm generally open to hearing about it (and acknowledge that I may have missed it in prior discussion).

Since the levee has broke and it looks like we're having the discussion here, I'm at least going to lock it and limit to maintainers 😅 carry on

I'm voting "Yes", but with the statement that we need to make it a MUST to validate the presence of targeted subject.

That would be a breaking change to me. The spec says registries MAY validate the content, there's nothing existing in the spec that says any one field in a manifest MUST be validated. There was a push to support sparse manifests that required little in the spec to change because we already supported them. Pushing things in DAG order is useful for consistency, but the perception of what that order is varies based on the perception of whether this is a single DAG or two DAGs associated by an API.

The direction I've been pushing for is to acknowledge that this is a new field with different usage from the existing fields, where different behavior is appropriate. We do not give out the reference to a signature to runtimes and have them pull the image from the subject field. Consumers pull the image, and expect the content associated with that image to exist, including the layers and config, and now we're adding the signature, SBOM, and potentially other metadata. For implementations that want to avoid breaking this logic on their consumers, it would be good for the spec to give them the ability to push content in either order.

Compared to OCI image-spec+distribution-spec 1.0, adding new validation to this field is a breaking change. Content that a user could push to an OCI conformant registry today will not be able to be pushed to a registry after is upgrades to the 1.1 specs with the added validation, since the OCI spec requires that unknown fields do not result in errors from registries and consumers.

The original spec and implementation returned an error for each missing blob: https://distribution.github.io/distribution/spec/api/#pushing-an-image-manifest. This seems to have been relaxed in a92b62e. It sounds like subject should fall into the existing MAY language and we should move on.

I don't see a valid reason to allow upload of invalid or out of order content. The signature, SBOM and most other metadata cannot exist until an image has a digest and that usually happens during the push preparation phase.

@stevvooe Additionally, beyond the order issue, there is a thought that some registries and mirrors may never hold the images or layers being referenced by subject, and will instead generate a new image index (via referrers api) with a set of objects having particular object subject references... Essentially a new registry type..

I wholly agree with the point of making it a MUST to validate the presence of a targeted subject at least before these objects are allowed to be pulled.. and also recognize the enforced DAG order as vital for image registries.

One possible soln would be language in the spec requiring at least one of two patterns (traditional DAG order required for repos that only allow present images before referring artifacts can be pushed/pulled, and un-traditional presence variants with some other validation implementation.)

Let's see what Derek comes up with, surely there are many options to resolve the impasse.

I don't see a valid reason to allow upload of invalid or out of order content.

@stevvooe this was captured as a requirement at the onset of the reference types working group (in March 2022). See item 3 here: https://github.com/opencontainers/wg-reference-types/blob/main/docs/REQUIREMENTS.md#content-management

"As a user, I want to push an artifact that references another artifact that doesn't exist on the registry yet."

As a user, I want to push an artifact that references another artifact that doesn't exist on the registry yet.

But why? What does this actually allow one to do that can't be done with current model.

It also seems at odds with this requiremet:

As an artifact producer, I want to push multiple artifacts concurrently (possibly from separate systems) without encountering race conditions or other errors.

If I can push a signature that references a non-existing artifact, I've created a race condition in the referrers API that didn't previously exist, which may never be resolved.

As a user, I want to push an artifact that references another artifact that doesn't exist on the registry yet.

But why? What does this actually allow one to do that can't be done with current model.

From previous discussions, it allows the signatures to be pushed before the image so that consumers never attempt to pull an unsigned image. It also allows metadata to be stored in a separate repository from the images they extend (I've heard the example of a security team that wants to push metadata to a restricted repository that only they can access, without the need to mirror every image, that does require tagging the metadata, and client tooling needs to be configured with an alternate repository to query the referrers API).

It also seems at odds with this requiremet:

As an artifact producer, I want to push multiple artifacts concurrently (possibly from separate systems) without encountering race conditions or other errors.

If I can push a signature that references a non-existing artifact, I've created a race condition in the referrers API that didn't previously exist, which may never be resolved.

I'm not sure I follow the question. That requirement was looking at the issue with concurrently pushing multiple platforms for an image from different pipelines and not having a way to avoid race conditions from the different pushes modifying the index simultaneously. If multiple jobs each push a different SBOM, having the referrers response generated by the registry gives the opportunity to resolve the race condition on the server. Without that we'd need to require registries to support conditional requests for clients to be able to implement it.

A race condition between the signature and image being pushed, if the signature is pushed first, feels very similar to the race condition between pushing the layer and the image manifest when the client is hoping the layer doesn't get GC'd before they push the manifest. At least from the client perspective, I understand the server may see that very differently.

vbatts commented

I vote "no"
Reasoning being there was slight precedent in the concept with the remote non-distributable layers.
As for the use case, I'm personally in favor of being able to have a subject that is not required to be on the same registry.

How is splaying an image and its associated content across registries is good for users?

It seems like with a MAY here applied to subject, as it is applied to other references, is the correct language, while still allowing these use cases to exist on outside registries, which can choose to not validate the presence. This preserves the baseline behavior and allows registries that want to break up content to not implement the validation.

Reasoning being there was slight precedent in the concept with the remote non-distributable layers.

This is not a good feature to point at since it is the only part of the spec that has lead to serious security issues. Registries today are also allowed to reject manifests with urls, I would also consider it a breaking change to say a registry must accept all urls.

As for the use case, I'm personally in favor of being able to have a subject that is not required to be on the same registry.

That is the inverse of this change (MUST vs MUST NOT), we are only considering MUST vs MAY. I am also in favor of the spec supporting this use case though. The current language says a registry MUST support a subject which is not in the registry (whether that is because it is on another registry or will come later is not know during upload). That is the breaking change, since registries today MAY require a referenced object to exist locally before accepting the content.

How is splaying an image and its associated content across registries is good for users?

I want to point out that this isn't just theoretical and is already happening today. Please see this section in the cosign project README, where COSIGN_REPOSITORY allows storing/locating image signatures in a separate registry. People are successfully using this feature today.

There are a number of real use-cases. For example, a user may be in a highly-regulated environment with strict rules on where certain data can be stored.

If it isn't guaranteed via the MUST language that a registry will support a manifest with a subject in a separate registry, then it is highly likely that cosign will likely NOT adopt the OCI 1.1 spec to maintain this functionality. The result of this will be the continued proliferation of sha256-* digest tags. This is one of the primary things that the WG set out to eliminate.

If it isn't guaranteed via the MUST language that a registry will support a manifest with a subject in a separate registry, then it is highly likely that cosign will likely NOT adopt the OCI 1.1 spec to maintain this functionality. The result of this will be the continued proliferation of sha256-* digest tags. This is one of the primary things that the WG set out to eliminate.

Can you explain this one a little further. If a tool is trying to push up referrers to a a registry which does not have the subject, how does MAY stop that? The MAY could just signal to the client that this registry requires the referenced content to be pushed up as well. So if say Docker Hub was the original source and your reference registry was the target, then the reference registry could support subject manifests without the subjects even if Docker Hub did not. If the worry is that the client may come across an attempt to push referrers to a registry which requires content, then the client can either push the content or reject the upload. Not implementing it all doesn't make sense since the currently language already says clients MUST implement the 1.0 fallback.

With the current language in the release candidate / main branches of image and distribution.., yes the MUST language as written here in dist is a breaking change.

As the subject reference is a descriptor, a quick check shows this MUST in dist breaks with the image spec https://github.com/opencontainers/image-spec/blob/main/descriptor.md for at least five descriptor rules.

Additionally there are a few (at least) broken rules here in the dist spec because of this change and more that need clarification.

Further, the term MUST "accept" here in dist is not well defined, as the registry MAY accept then garbage collect immediately after calling into question the meaning and depth of the MUST for the use case.

I never explained my own vote. I'm voting "no" because:

  • Including this content in the DAG is counter to how clients treat the DAG. Clients pull the root node of a DAG and then one or more child objects of that DAG, recursively. For the "subject/referrers" use case, clients pull the image and then query the registry to locate metadata on that image. The subject field isn't typically followed by clients, and is instead treated as an ID for a query API parameter. Since it is a new field, I believe we could make this explicit in the image-spec to call out that it isn't to be treated as a normal descriptor.
  • A change in the spec language would make some content pushed following the existing OCI image-spec 1.0 non-portable to any registries adding validation to this field. We should be careful anytime we make a change that results in non-portable content and ensure that it's necessary.
  • There are multiple use cases to push content in different orders.
    • Tooling may be adding metadata an existing image (pushing after the image was pushed). Changing the language would only support this scenario.
    • Uploading a new image with metadata (pushing the metadata first so it exists when the image is pushed). This scenario would not be supported directly, but a tag hack could be used by clients to workaround the restrictions.
    • Pushing only metadata (useful in repositories that have a separate security/isolation requirement from the image content and some metadata, i.e. one team owns the images while another owns the security scan results). This scenario would be blocked by any registry validating the subject field, requiring users to either copy data that would never be pulled, or search for a registry that does not validate the field.
  • The image-spec was updated at the request of clients that wanted to avoid a mix of registry implementations, where content could be pushed to some but not all registries. This shows the value in OCI providing a baseline functionality that all clients can assume, rather than fragmenting registry functionality.
  • When the community weighed in on the vote, there was a strong desire to support the current language (even after consolidating multiple votes from at least two companies).
  • With the proliferation of multi-platform images, there is user demand for better support of sparse manifests, showing the current DAG validation is already being disabled for other use cases.

If there's a way we can resolve this in a way to registries and clients are both happy, I'd jump at the opportunity. Since we've been discussing this for 3 months with no solution in sight, I'm not confident more discussion will find that solution, so I'm voting with the community to move forward.

the question being voted on is the wrong question... IMO..

Should there be use case support for OCI registries having the option to receive and store artifacts having a subject descriptor reference to a portion of a merkle dag index/image that it is not a part of [x] yes (in this yes the language should be clear this is a new type of PUSH api having different rules and possibly requiring different registry indexing and garbage collection.) Should a registry be allowed to reject or at least subsequently garbage collect these artifacts if they are not placed in a dag that includes the subject descriptor referenced object and/or have not been validated? [x] yes

Should there be force language here stating that the registry MUST validate subject descriptors (repeating the current rule in image) or optionally provide an indication of validity for artifacts not in a portion of the index/image dag [x] yes

IMO, I don't think this is a breaking change.

I view the referrers as the root of a separate tree with its own validation semantics that are purposely distinct from images.
I do believe that the DAG guarantees raised by @stevvooe are an important part of the design for images and a discussion around sparse images is coming, but is orthogonal to the definition of "valid" for referrers' subjects.

To reject this now would break relied upon functionality that was also a part of the original design requirements for referrers.

Gronking the point that a manifest with a subject descriptor field that references a manifest that does not exist MAY be "the root of a separate tree with its own validation semantics purposely distinct from images." Noting that those words are implied by the MUST language but are not written down in the spec.

That same manifest MAY also be added in such a way that is not distinct from images and still adheres to the DAG guarantees.

The MUST accept language reads, to me, as forcing the choice and appears to be contradicting to the existing specs.

I think the question is moot, regardless...

My interpretation is that it's impossible for this to be a breaking change because prior to the subject field existing, this line (that got removed in opencontainers/image-spec#1030... accidentally? why was that merged?) wouldn't allow them to reject it.

Gronking the point

I have no idea what this means.

I think the question is moot, regardless...

My interpretation is that it's impossible for this to be a breaking change because prior to the subject field existing, this line (that got removed in opencontainers/image-spec#1030... accidentally? why was that merged?) wouldn't allow them to reject it.

How would they process referrers requests without adding support for the new fields.

Agree the unknown fields and media types MUST be ignored part should be put back.. the additional clarification and not generate an error is ok I think.

grok.. typo

@jonjohnsonjr if your interpretation of "A registry MUST accept an otherwise valid manifest with a subject field that references a manifest that does not exist, allowing clients to push a manifest and referrers to that manifest in either order" is the unavailable fallback path described here because they must ignore the field.. well ok.. but it doesn't say that and the current wording of the fallback doesn't support that interpretation.

Thanks everyone for voting and sharing your thoughts. The result of the vote is 5 "no" and 3 "yes". I am closing this issue.