project-zot/zot

[Feat]: Scale-out cluster support for independent per-instance storage deployments

vrajashkr opened this issue · 6 comments

Is your feature request related to a problem? Please describe.

The current implementation of scale-out cluster is supported on shared-storage deployments - many instances and one reliable shared storage such as S3.

Currently, some REST APIs will not work for deployments where the image storage for each instance is isolated (for example, one zot instance per VM with independent disk storage).

Describe the solution you'd like

In a scale-out cluster deployment using isolated per-instance storage, APIs such as /v2/_catalog as well as GraphQL endpoints will need to fan-out requests to all the members and aggregate the received data before returning results.

Describe alternatives you've considered

N/A

Additional context

No response

/v2/_catalog is easier to handle as the logic is entirely implemented using data structs.
For this, we can simply detect a scale out cluster in non-shared storage mode and then fan-out + aggregate.

However, the GQL APIs are handled via a GQL server handling auto generated GQL - the implementation of which cannot be changed.
Additionally, there is a need to support pagination.

extRouter.Methods(allowedMethods...).
		Handler(gqlHandler.NewDefaultServer(gql_generated.NewExecutableSchema(resConfig)))

One approach might be to look into the Resolver code (resolver.go) and modify the data read to include proxied results before the response is taken for pagination.

For example:

func (cveinfo BaseCveInfo) GetImageListForCVE(ctx context.Context, repo, cveID string) ([]cvemodel.TagInfo, error)

This function must ask all other nodes for images having this CVE before returning the TagInfo objects.

That logic could be introduced as a wrapper on top of these functions that will do the following:

  1. Call the function locally
  2. Call the function for all other members
  3. Return the result to client

To facilitate this, there needs to be a new set of API endpoints that return this data without pagination. i.e the same GQL query except without any pagination.

Since the request data does not flow to the resolver, there would be a need to bring that logic to the resolver so GetImageListForCVE knows to call GetImageListForCVE on all other hosts via the API. Each zot instance would potentially become a GQL client as well.

@vrajashkr do you want to start with paginated result (aggregate) of paginated results (fanout)? The second part is what needs to redesigned? Maybe we simply add a pagination=off param.

Separately, keep an eye on sync as a means to "heal" after a cluster resize.

There are 2 major items to handle:

/v2/_catalog

The approach for this would be fairly straightforward as there is no pagination for this.
zot-local-scale-out-catalog

  • The other zot instances will not re-proxy the requests as the proxied request will contain a custom zot header that the instances will read and use as a reference to prevent a re-proxy. This avoids a proxy storm.
  • The client-serving instance of zot (zot 1 in the image) will merge the deserialized responses with the local data in struct form. This keeps things simple and reliable.

/v2/_zot/ext/search

Here's a diagram representing the approach we discussed:
zot-local-scale-out-search-gql

  • The existing 99designs route handler gqlHandler.NewDefaultServer(gql_generated.NewExecutableSchema(resConfig)) will be wrapped by a custom handler which handles the proxy logic.
  • The other zot instances will not re-proxy the requests as the proxied request will contain a custom zot header that the instances will read and use as a reference to prevent a re-proxy. This avoids a proxy storm.
  • When the request is seen with the custom zot proxy header set, the other instances will ignore pagination and should return their full dataset. The instance serving the client will merge the results and then apply pagination. We can probably optimize this later, but it probably requires a decision to address issues such as results truncation - do we want results from a subset of instances or a paginated subset of results from each instance? Additionally, the results will need to be consistent in each query for the given scale-out cluster.

Challenges with this approach:

  • Since this approach is proxying at a level above the GQL handler, the new code needs to be able to read GQL schemas in order to form the right structs and merge the response data. (Need to figure out how to do this).
  • Pagination queries will pay a penalty as they fetch all the data before paging it (kind of defeats the point of pagination), but this is needed for consistency of response for the cluster as a whole.

@andaaron @rchincha
It would be great to get your thoughts on the approaches detailed above! Thanks!

GQL Query behaviours:
reference: pkg/extensions/search/schema.graphql

Proxy-once to target member or handle locally

CVEListForImage
ImageList - only if there is a repo specified in the arguments
ExpandedRepoInfo
GlobalSearch - when searching for a tag inside a repo
Image
Referrers

Fan-out proxy to all other members + local

ImageListForCVE
ImageListWithCVEFixed
ImageListForDigest
RepoListWithNewestImage
ImageList - when the repoName is sent as ""
GlobalSearch - when searching for repos
DerivedImageList
BaseImageList

Depending on meta storage strategy

If the metadata DB is shared, then they can be processed locally, but if the metadata DB is not shared, this request needs to be fanned out:
StarredRepos
BookmarkedRepos

Handler logic change to dynamically query other zot instances

CVEDiffListForImages - the data for each image (subtrahend and minuend) are both available from their respective single zot instances. Depending on where the request lands, there may be 2 requests made to other zot instances (if both repos are not on the current instance), one request (if either of the images are not available locally), or no requests (if both images are available locally). Note: this requires a change in the handler itself as there is no logical way to proxy the incoming request to get the required data.