midl-dev/polkadot-k8s

reduce cloud costs + introduce pulumi

nicolasochem opened this issue · 1 comments

There are several issues with the way polkadot is deployed:

High cloud costs

GKE egress cost is $0.09/gb, and every polkadot validator by default uses 16TB or egress per month. Moving to OVH or digital ocean would be a good way to bring costs down while still benefiting from a supported kubernetes platform.

Issues with templating

The kustomization method was initially prefered over helm due to the fact that it was natively supported by kubectl, however it is inadequate:

  • we are prefixing pod and service names with the three-letter namespace name, so we have to patch a lot of objects to make this mapping consistent
  • there are too many levels of patching: we create a yaml kustomize patch with terraform templating, which is in turn patching a kubernetes base resource
  • kustomize is not widely used
  • helm values.yaml structure is very useful to map complex concepts into infrastructure objects (see tezos-k8s project)

Issues with terraform k8s provisioning

We have been using "shell Provisioner" to deploy new k8s resource with kustomize but it's only additive. When I remove a resource from source code, it's not automatically removed from infra. using pulumi and helm would solve this (but using terraform with the kustomize provisioner would also solve it)

As discussed, the ideal layout for the infrastructure repo is:

  • top-level pulumi typescript module
  • top-level index.ts defines a digital ocean cluster
  • a subdirectory contains all the code that can be open-sourced:
    • helm chart for polkadot-k8s (already exists)
    • pulumi component resource instantiating this helm chart (example)
  • top-level index.ts instantiates the component resource above several times

The immediate goal is to replicate the "midl-prod" internal repo. I should be able to quickly modify index.ts for:

  • upgrading polkadot version in one validator or all of them
  • add a new validator

This can be copy-and-paste then do pulumi up for now.

Later it will be easy to:

  • turn the open-source repo into a submodule with tutorials
  • add a pulumi CI to autodeploy with github actions
  • support helm release instead of in-repo helm chart