I followed the documentation to update the certificate and the cluster crashed.
siaimes opened this issue · 2 comments
Organization Name:
Short summary about the issue/question:
DOC: https://github.com/microsoft/pai/blob/master/docs/manual/cluster-admin/how-to-renew-k8s-cert.md
The root of the issue lies in this line of code:
ansible-playbook -i hosts.yml --limit '!master-node' --become --become-user root renew-worker-cert.yaml
As shown in the figure, the master node should use !kube-master
to exclude instead of !master-node
, which causes the master node to update itself as a worker node, and the cluster crashes.
So this line should be changed to:
ansible-playbook -i hosts.yml --limit '!kube-master' --become --become-user root renew-worker-cert.yaml
Other minor issues:
Currently the etcd of the openpai cluster does not seem to have a certificate, so there is no need to etcd related commands.
Brief what process you are following:
How to reproduce it:
OpenPAI Environment:
- OpenPAI version:
- Cloud provider or hardware configuration:
- OS (e.g. from /etc/os-release):
- Kernel (e.g.
uname -a
): - Hardware (e.g. core number, memory size, storage size, GPU type etc.):
- Others:
Anything else we need to know:
My one command solution for this doc:
Thanks for this. And there is another option to rotate cert automatically, please refer: https://kubernetes.io/docs/tasks/tls/certificate-rotation/. We have an issue for this #5439