SSU-DCN/podmigration-operator

kubeadm init error

Closed this issue · 11 comments

when i run kubeadm init,i get errors like:
root@server:/home/server/Downloads/tmp/zly/podmigration-operator# sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --cri-socket unix:///var/run/containerd/containerd.sock
I0124 14:53:12.678735 24843 version.go:252] remote version is much newer: v1.29.1; falling back to: stable-1.19
W0124 14:53:14.622308 24843 configset.go:250] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[init] Using Kubernetes version: v1.19.16
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR CRI]: container runtime is not running: output: time="2024-01-24T14:53:14+08:00" level=fatal msg="validate service connection: CRI v1 runtime API is not implemented for endpoint "unix:///var/run/containerd/containerd.sock": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService"
, error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with --ignore-preflight-errors=...
To see the stack trace of this error execute with --v=5 or higher
can anyone teach me how to solve this problem?i have no idea,thank you very much!

@120L020314
please follow this guide here: https://github.com/SSU-DCN/podmigration-operator/blob/main/init-cluster-containerd-CRIU.md
If you handle certain tasks independently, they fall outside the scope of this Git repository.

i follow the guide of the https://github.com/SSU-DCN/podmigration-operator/blob/main/init-cluster-containerd-CRIU.md,but when i run kuneadm init,it fails.
root@server:/home/server/Downloads/tmp/zly/podmigration-operator# kubeadm init --kubernetes-version stable-1.19 --pod-network-cidr=10.244.0.0/16
W0124 15:29:45.025697 33937 configset.go:250] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[init] Using Kubernetes version: v1.19.16
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR CRI]: container runtime is not running: output: time="2024-01-24T15:29:45+08:00" level=fatal msg="validate service connection: CRI v1 runtime API is not implemented for endpoint "unix:///run/containerd/containerd.sock": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService"
, error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with --ignore-preflight-errors=...
To see the stack trace of this error execute with --v=5 or higher
my instruction is:
sudo apt-get update
sudo apt-get install gcc
mkdir tmp
cd tmp/
mkdir zly
cd zly
sudo wget https://golang.org/dl/go1.15.5.linux-amd64.tar.gz
sudo tar -xzf go1.15.5.linux-amd64.tar.gz
sudo mv go /usr/local
sudo gedit $HOME/.profile
内容如下:

export GOROOT=/usr/local/go
export GOPATH=$HOME/go
export GOBIN=$GOPATH/bin
export PATH=$GOROOT/bin:$GOBIN:$PATH

source $HOME/.profile
go version
sudo apt install make
wget https://github.com/containerd/containerd/releases/download/v1.3.6/containerd-1.3.6-linux-amd64.tar.gz
mkdir containerd
tar -xvf containerd-1.3.6-linux-amd64.tar.gz -C containerd
sudo mv containerd/bin/* /bin/
cd containerd/
wget https://k8s-pod-migration.obs.eu-de.otc.t-systems.com/v2/containerd
cd ..
apt install git -y
git clone https://github.com/SSU-DCN/podmigration-operator.git
cd podmigration-operator
tar -vxf binaries.tar.bz2
cd custom-binaries/
chmod +x containerd
sudo mv containerd /bin/
sudo mkdir /etc/containerd
sudo gedit /etc/containerd/config.toml
内容如下:

[plugins]
[plugins.cri.containerd]
snapshotter = "overlayfs"
[plugins.cri.containerd.default_runtime]
runtime_type = "io.containerd.runtime.v1.linux"
runtime_engine = "/usr/local/bin/runc"
runtime_root = ""

wget https://github.com/opencontainers/runc/releases/download/v1.0.0-rc92/runc.amd64
whereis runc
sudo mv runc.amd64 runc
chmod +x runc
sudo mv runc /usr/local/bin/
sudo gedit /etc/systemd/system/containerd.service
内容如下:

[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target

[Service]
ExecStartPre=/sbin/modprobe overlay
ExecStart=/bin/containerd
Restart=always
RestartSec=5
Delegate=yes
KillMode=process
OOMScoreAdjust=-999
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity

[Install]
WantedBy=multi-user.target

sudo systemctl daemon-reload
sudo systemctl restart containerd
sudo systemctl status containerd
sudo gedit /etc/sysctl.conf
添加内容:

...
net.bridge.bridge-nf-call-iptables = 1

sudo -s
sudo echo '1' > /proc/sys/net/ipv4/ip_forward
exit
sudo sysctl --system
sudo modprobe overlay
sudo modprobe br_netfilter
gedit /etc/hosts
内容如下:

192.168.31.47 server
192.168.31.48 agent1
192.168.31.49 agent2

cd ..
cd ..
apt install curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add
sudo apt-add-repository "deb http://apt.kubernetes.io/ kubernetes-xenial main"
sudo apt-get install kubeadm=1.19.0-00 kubelet=1.19.0-00 kubectl=1.19.0-00 -y
whereis kubeadm
whereis kubelet
git clone https://github.com/vutuong/kubernetes.git
cd podmigration-operator/custom-binaries
chmod +x kubeadm kubelet
sudo mv kubeadm kubelet /usr/bin/
sudo systemctl daemon-reload
sudo systemctl restart kubelet
sudo systemctl status kubelet
sudo gedit /etc/fstab

/swapfile none swap sw 0 0

swapoff -a
sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --cri-socket unix:///var/run/containerd/containerd.sock
my environment is:
ubuntu 18.04.6
i don't know which step is wrong,sorry to disturb you,but i am very interested in your work,thank you for help!

my containerd status like:
root@server:/home/server/Downloads/tmp/zly/podmigration-operator# systemctl status containerd
● containerd.service - containerd container runtime
Loaded: loaded (/etc/systemd/system/containerd.service; disabled; vendor preset: enabled)
Active: active (running) since Wed 2024-01-24 15:35:02 CST; 16s ago
Docs: https://containerd.io
Process: 35109 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
Main PID: 35111 (containerd)
Tasks: 14 (limit: 4630)
CGroup: /system.slice/containerd.service
└─35111 /bin/containerd

1月 24 15:35:02 server containerd[35111]: time="2024-01-24T15:35:02.078811340+08:00" level=info msg="loading plugin "io.containerd.grpc.v1.introspection"..." type=io.containerd.gr
1月 24 15:35:02 server containerd[35111]: time="2024-01-24T15:35:02.078959926+08:00" level=info msg="Start subscribing containerd event"
1月 24 15:35:02 server containerd[35111]: time="2024-01-24T15:35:02.079008127+08:00" level=info msg="Start recovering state"
1月 24 15:35:02 server containerd[35111]: time="2024-01-24T15:35:02.079056682+08:00" level=info msg="Start event monitor"
1月 24 15:35:02 server containerd[35111]: time="2024-01-24T15:35:02.079063528+08:00" level=info msg="Start snapshots syncer"
1月 24 15:35:02 server containerd[35111]: time="2024-01-24T15:35:02.079068185+08:00" level=info msg="Start cni network conf syncer"
1月 24 15:35:02 server containerd[35111]: time="2024-01-24T15:35:02.079072115+08:00" level=info msg="Start streaming server"
1月 24 15:35:02 server containerd[35111]: time="2024-01-24T15:35:02.079928061+08:00" level=info msg=serving... address=/run/containerd/containerd.sock.ttrpc
1月 24 15:35:02 server containerd[35111]: time="2024-01-24T15:35:02.080019996+08:00" level=info msg=serving... address=/run/containerd/containerd.sock
1月 24 15:35:02 server containerd[35111]: time="2024-01-24T15:35:02.080032902+08:00" level=info msg="containerd successfully booted in 0.017574s"
and my kubelet status like:
root@server:/home/server/Downloads/tmp/zly/podmigration-operator# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: activating (auto-restart) (Result: exit-code) since Wed 2024-01-24 15:36:15 CST; 3s ago
Docs: https://kubernetes.io/docs/home/
Process: 35409 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=255)
Main PID: 35409 (code=exited, status=255)
maybe because i haven't init my cluster successfully.

sorry,thank you for your answer,i succeed when i make by myself in my machine.the problem is containerd 's not installed correct.

sorry,thank you for your answer,i succeed when i make by myself in my machine.the problem is containerd 's not installed correct.

Thank you for your interest. If you are success, please help to give me a star for my fame =))) . And please help to close this issue.

ok,thank you,i am trying to continue do your work in my ubuntu. maybe i will encount more
question ,please help me,thank you very much !

root@server:/home/server/Downloads/tmp/zly# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-856dbd57b4-4btrn 0/1 Pending 0 44m
coredns-856dbd57b4-lqxvw 0/1 Pending 0 44m
etcd-server 1/1 Running 0 44m
kube-apiserver-server 1/1 Running 0 44m
kube-controller-manager-server 1/1 Running 0 44m
kube-proxy-g5brg 1/1 Running 0 44m
kube-proxy-kqkbj 1/1 Running 0 23s
kube-scheduler-server 1/1 Running 0 44m
root@server:/home/server/Downloads/tmp/zly# kubectl describe pod coredns-856dbd57b4-4btrn -n kube-system
Name: coredns-856dbd57b4-4btrn
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node:
Labels: k8s-app=kube-dns
pod-template-hash=856dbd57b4
Annotations:
Status: Pending
IP:
IPs:
Controlled By: ReplicaSet/coredns-856dbd57b4
Containers:
coredns:
Image: k8s.gcr.io/coredns:1.6.7
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from coredns-token-475rr (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
coredns-token-475rr:
Type: Secret (a volume populated by a Secret)
SecretName: coredns-token-475rr
Optional: false
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message


Warning FailedScheduling 2m4s (x32 over 45m) default-scheduler 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
Warning FailedScheduling 54s default-scheduler 0/2 nodes are available: 2 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
sorry,but my core-dns can not be schduled,maybe i need flannel cni plugin? @vutuong thank you very much,i am sorry to disturb you.

@120L020314
please check your node and node taint information in your node from:

 k get nodes
 k describe node your_node_name

@vutuong sorry to disturb you,but when i run kubectl checkpoint simple /var/lib/kubelet/migration/simple,i find can not make a checkpoint.i don't know how to solve this. simple is a pod running in work1,and i look the log of the kubelet in this node ,like this:1月 24 21:49:09 agent1 kubelet[111046]: I0124 21:49:09.692462 111046 kubelet.go:1505] Checkpoint the firstime running pod to use for other scale without booting from scratch: %+vsimple
1月 24 21:49:09 agent1 kubelet[111046]: E0124 21:49:09.692913 111046 remote_runtime.go:289] CheckpointContainer "5fab4d089320a38aa93aed6b865b306d5764ca1643ea82446e3d0097e05cb584" from runtime service failed: rpc error: code = Unimplemented desc = unknown method CheckpointContainer for service runtime.v1alpha2.RuntimeService
1月 24 21:49:09 agent1 kubelet[111046]: I0124 21:49:09.693279 111046 kuberuntime_manager.go:841] Should we migrate?Runningfalse
1月 24 21:49:36 agent1 kubelet[111046]: I0124 21:49:36.691280 111046 kuberuntime_manager.go:841] Should we migrate?Runningfalse
1月 24 21:50:15 agent1 kubelet[111046]: I0124 21:50:15.691442 111046 kuberuntime_manager.go:841] Should we migrate?Runningfalse
1月 24 21:50:19 agent1 kubelet[111046]: I0124 21:50:19.691534 111046 kuberuntime_manager.go:841] Should we migrate?Runningfalse
please help me ,thank you very much!

my virtual machine is Ubuntu 18.04.6,when i run criu check --al,it shows root@agent1:/var/lib/kubelet/migration# criu check --all
Warn (criu/cr-check.c:1230): clone3() with set_tid not supported
Error (criu/cr-check.c:1272): Time namespaces are not supported
Looks good but some kernel features are missing
which, depending on your process tree, may cause
dump or restore failure.
i do not know how to fix it。 @vutuong

sorry,i have solved my all question ! your work is so meaningful for me ,i learn a lot .the problem a lot is caused by containerd , i just replace containerd in /bin/,it works normal.