m-lab/etl-gardener

Adjust node pools in data-processing clusters

Opened this issue · 6 comments

The most recent change to etl k8s configs failed to launch pods in staging, because I had set the 8 core node-pool to 0 instances.

Also, in all projects, the utilization is low, because there are more nodes than needed for the requested pods.

All the pools should be updated to use appropriate auto-scaling configs.

Today I am:
changing sandbox default pool to allow 1 node per zone, and remove zone us-east1-d
changing staging 8 core parser-pool1 to allow 0-2 nodes per zone.
changing staging 4 core parser-pool to allow 0-1 nodes per zone.

Tomorrow, I intend to change prod to set up auto-scaling for parser, default, and gardener pools.

removing zone us-east1-d from the default pool in mlab-sandbox made gardener unschedulable, because of persistent volume location. Restored us-east1-d, and adjusted auto-scaling to allow 1-2 per zone.

Later changed to 0-2 per zone

Subsequently updated mlab-sandbox parser-pool1 to allow 0-2 nodes per zone as well.

@gfr10598 can you enumerate the steps that are needed before the next gardener release to production?

I just deleted the mlab-staging parser-pool1 node pool from data-processing cluster. K8S had restarted parsers, and they were instantiated in the wrong node pool. Deleting the pool should prevent this happening again.

See m-lab/etl#985 related to propagating errors from etl to gardener.