project-zot/zot

[Feat]: Persist API keys

AndersBennedsgaard opened this issue · 10 comments

Is your feature request related to a problem? Please describe.

Whenever we restart Zot, user API keys are seemingly removed. This makes it such that users have to recreate their keys to interact with Zot through automation.

Describe the solution you'd like

It seems like the API keys are stored in the cache service: https://github.com/project-zot/zot/blob/main/pkg/api/routes.go#L2106

However, it doesn't really make sense to store something like API keys in a temporary cache, since these are long-lived (which caches are not, by definition)

Instead, the API keys should be stored in some long-lived storage, like S3

Describe alternatives you've considered

No response

Additional context

No response

Hi @AndersBennedsgaard, the keys are NOT stored in a temporary cache. They are stored in DynamoDB (AWS case) or in BoltDB (local case), depending on your configuration. Can you please provide more details on the config you are using?

According to https://zotregistry.dev/v2.1.0/articles/storage/?h=bolt#cache-drivers:

A cache driver is used to store duplicate blobs when dedupe is enabled. zot supports database caching using BoltDB as the cache driver for local filesystems and DynamoDB for remote filesystems.

To me, it sounds like BoltDB is used as a cache, and not a long-lived data store. Is that not correct?

My storage configuration looks like:

storage:
  rootDirectory: /var/lib/registry
  dedupe: false
  storageDriver:
    name: s3
    regionendpoint: http://minio.minio.svc.cluster.local:9000
    region: us-east-2
    bucket: zot
    skipverify: false
    secure: false
  gc: true
  gcDelay: 1h
  gcInterval: 24h

I have not set the /var/lib/registry directory up to be persistent on restart, or shared between my two Zot replicas. I guess I have to do that?

To me, it sounds like BoltDB is used as a cache, and not a long-lived data store. Is that not correct?

The word "cache" is used incorrectly in the config and documentation.
The original idea was that ZOT would use hardlinks to dedupe image blobs on the disk, and the "cache" is supposed to hold that information for fast(er) access in a BoltDB instance. Since you could also identify what blobs could be/are deduped by looking at the blob digests on the disk, the original wording in zot for that is that we "cache" this information in a DB. But it is by no mean a short lived cache. It is persisted, but faster than accessing each folder in the root dir to look for duplicates.

Those "cacheDriver" settings are not for some temporary data.

I think that you could either:

  1. Make sure /var/lib/registry is persisted between startups. I think we removed the restriction which disallowed BoldDB to be used also with S3 storage at some point, and I suspect in your case you are writing the data as a BoltDB file to /var/lib/registry. But in your case where you have multiple zot instances, I don't think it makes sense as either each has its own BoltDB instance (so the API key would only work on one and not on the other for example), or reuse the same BoltDB file mounted in different containers, but this could result in issues when writing to it (concurrency).

  2. Use DynamoDB which should take care of your problems. You need to configure the cacheDriver settings in the zot config.
    Something like

    "cacheDriver": {
      "name": "dynamodb",
      "endpoint": "http://localhost:4566",
      "region": "us-east-1",
      "cacheTablename": "ZotBlobTable",
      "repoMetaTablename": "ZotRepoMetadataTable",
      "imageMetaTablename": "ZotImageMetaTable",
      "repoBlobsInfoTablename": "ZotRepoBlobsInfoTable",
      "userDataTablename": "ZotUserDataTable",
      "versionTablename": "ZotVersion",
      "apiKeyTablename": "ZotApiKeyTable"
    }

But we don't have a non-Amazon equivalent of DynamoDB which would pair with minio.

  1. In testing we have used localstack for DynamoDB testing, but I am not sure how well it would work in production.
    Theoretically you could use minio for storing blobs and localstack for the DB (but not storing blobs), kind of like we have in https://github.com/project-zot/zot/blob/main/test/gc-stress/config-gc-bench-s3-minio.json

I thought that Zot was considered stateless, such that running multiple replicas would be "easy", but after having read up on Clustering and Scale-out clustering it doesn't sound like this is the case.

My assumption was that since we use remote S3 storage, it would be fine to just use a local cache (BoltDB) in favor of DynamoDB since that is just there to speed up requests, so we could just set the number of replicas to more than 1 with no other configuration change, and have Zot running in HA mode. This is obviously not true. Shared state such as API keys, user data, Trivy scans, etc. is not stored in remote storage.

Since BoltDB uses an exclusive write lock I can't run multiple Zot instances on a shared BoltDB database, so I either have to reduce the number of replicas to 1 and persist the database, or use LocalStack DynamoDB/dynamodb-local for the cache driver, but I don't think these are production ready or even licensed such that we can use them.

The word "cache" is used incorrectly in the config and documentation.

Should that be fixed then? It sounds like the "cache" is used to store stuff that shouldn't be stored in a cache, but instead in actual databases which can be shared between Zot instances. For example: move API keys and user data to a PostgreSQL instance

It is stateless if you:

  • don't use features such as authentication (API keys and user information/preferences stored in the DB, but also sessions in general)
  • don't use the graphql api / UI (they need metadata which is not feasible to be read from the disk on every request, so we store it in the DB)
  • don't use image retention settings (they require image usage data which is not captured in the image spec, and cannot be read from the image manifest, such as the time when an image was last downloaded, and we store in the DB)
  • don't use the feature which verified image signatures server-side, and stores the results in the DB.

If you don't need any of the above, you can build zot-minimal, disable authentication in the zot configuration (you could try using bearer tokens which should be fine as far as I recall), and zot will be stateless.

So in in a microservice-based use-case, where zot is used simply for the dist-spec API (and not for authentication, UI, graphql queries and the rest of the above), and other services are responsible for those functions, you could have as many instances as you want using the shared storage.

Should that be fixed then? It sounds like the "cache" is used to store stuff that shouldn't be stored in a cache, but instead in actual databases which can be shared between Zot instances. For example: move API keys and user data to a PostgreSQL instance

It is not a "cache", the wording should change in the documentation, and configuration. <-- @rchincha

We are open to accept contributions for other implementations of the MetaDB and UserDB (https://github.com/project-zot/zot/blob/main/pkg/meta/types/types.go#L61) using PostgreSQL, or other databases besides BoltDB and DynamoDB. There is a Redis PR somewhere but it is not complete as far as I know.

It is stateless if you:

* don't use features such as authentication (API keys and user information/preferences stored in the DB, but also sessions in general)

* don't use the graphql api / UI (they need metadata which is not feasible to be read from the disk on every request, so we store it in the DB)

* don't use image retention settings (they require image usage data which is not captured in the image spec, and cannot be read from the image manifest, such as the time when an image was last downloaded, and we store in the DB)

* don't use the feature which verified image signatures server-side, and stores the results in the DB.

In summary: Zot would be stateless if you just use a remote database (DynamoDB, Redis, Postgres, whatever)?

According to Developer Guide / Onboarding, there is a couple of other cases which does not support remote storage: Trivy scans, user sessions (isn't this stored in the database?), PKI documents, and some temporary repo sync stuff. This should probably also support remote storage to make Zot actually stateless (except repo sync I guess)

In my previous comment I mentioned the criteria zot uses for initializing the object responsible for communicating with the DB https://github.com/project-zot/zot/blob/main/pkg/api/controller.go#L324.
Yes, there are the other cases which are not necessarily related to the DB, but in which should be considered for clustering design, see: #125 (comment).

Still if you use the zot minimal binary (with no extensions), without authentication you should be able to run a zot cluster with minio storage. Regardless what other files it may create locally (for the dedupe "cache" I mentioned before). Provided you don't need any of the features I mentioned in this issue or in #125

For clustering, best to keep every zot instance stateless. The counterparts from cloud to on-prem are:

s3 -> minio (for storage)
dynamodb -> redis (for "cache") - this is wip #2412

I feel like we are misunderstanding each other quite a lot here, but since the original "Persistence of API keys" is a non-issue, I will close this