sigstore/helm-charts

High availability for Sigstore services

ndegory opened this issue · 2 comments

Description

What needs to be improved

documentation on how to run the Sigstore service with high availability.

Context

The Helm charts in sigstore/helm-charts default to a single replica for all services. Some charts allow setting replicaCount for the main service, but there is no guidelines on how to make the service highly available.

As an example, the Rekor chart has a replicaCount for the rekor server, but the chart also deploys mysql and redis, without any options to run them on more than one replica. For these dependencies, it's more complicated than changing the replicaCount in the deployment or statefulset. Same can be said for Trillian, the mysql dependency doesn't allow an HA configuration.

What should be done

  • Updating the documentation with the current options to raise the replica counts (is it reliable, or should we leave it to 1)
  • Enhance the Helm charts by allowing to set HA configuration on dependencies (mysql, redis)

Hi @ndegory, thanks for opening this issue. Yes, AFAIK Sigstore does not have documentation on scalability aspects of any component. @ianhundere has done some great work on adding tolerations, nodeSelector and affinity to all the helm charts which helps with scalability and highly available setup.

The following components are stateless and easy to scale using replicaCount and affinity/anti-affinity settings:

  • Fulcio
  • CTLog
  • Rekor
  • Trillian LogServer

For Trillian LogSigner, we need to setup etcd for leader election and this information is available in detail here

The Trillian helm chart (rekor's dependency) supports spinning up a MySQL instance automatically but I do believe the intention was to make Sigstore up and running for testing and development as easy as possible. The burden of scaling MySQL is on the user and most of the production setups make use of hosted MySQL on public clouds. A similar thought applies to Redis for Rekor.

We could do better by leveraging existing popular MySQL and Redis helm charts and make them as a dependency within Sigstore helm-charts and make them directly available to private Sigstore users operators. We can also add features like HPA that helps high availability.

Thank you @vipulagarwal for the overview of the HA options. I don't think it would be worth to add more dependencies, I'll give a try disabling mysql and redis in the values file and point to a MySQL cloud service and a HA deployment of Redis.
If that works, I'll submit a documentation PR.