ApsaraDB/PolarDB-Stack-Operator

Error with polarstack-daemon.

Opened this issue · 3 comments

I install polardb by install.sh and I have modify env.yaml according to my own configuration. However, pod created by polarstack-daemon can't work properly. Here is the log of the pod.

----------------------------------------------------------------------------------------------
|                                                                                           |
| polarbox cloud branch:master commitId:b3f3fde34f4e018cf8ca28625e8d9042ee7bb1f1 
| polarbox repo https://github.com/ApsaraDB/PolarDB-Stack-Daemon.git
| polarbox commitDate Wed Oct 20 14:33:55 2021 +0800
|                                                                                           |
----------------------------------------------------------------------------------------------
start polarbox controller-manager cloud-provider
I0207 18:21:43.031321       1 main.go:48] --------------------------------------------------------------------------------------------
I0207 18:21:43.031391       1 main.go:49] |                                                                                           |
I0207 18:21:43.031398       1 main.go:50] |                              polarstack-daemon                                            |
I0207 18:21:43.031404       1 main.go:51] |                                                                                           |
I0207 18:21:43.031410       1 main.go:52] --------------------------------------------------------------------------------------------
W0207 18:21:43.032072       1 client_config.go:541] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:	export GIN_MODE=release
 - using code:	gin.SetMode(gin.ReleaseMode)

[GIN-debug] GET    /healthz                  --> github.com/ApsaraDB/PolarDB-Stack-Daemon/polar-controller-manager/bizapis.health (3 handlers)
[GIN-debug] GET    /api/v1/TestConn          --> github.com/ApsaraDB/PolarDB-Stack-Daemon/polar-controller-manager/bizapis.Handle.func1 (3 handlers)
[GIN-debug] GET    /api/v1/GetStandByIp      --> github.com/ApsaraDB/PolarDB-Stack-Daemon/polar-controller-manager/bizapis.Handle.func1 (3 handlers)
[GIN-debug] POST   /api/v1/RequestCheckCoreVersion --> github.com/ApsaraDB/PolarDB-Stack-Daemon/polar-controller-manager/bizapis.Handle.func1 (3 handlers)
[GIN-debug] POST   /api/v1/InnerCheckCoreVersion --> github.com/ApsaraDB/PolarDB-Stack-Daemon/polar-controller-manager/bizapis.Handle.func1 (3 handlers)
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x12651a0]

goroutine 22 [running]:
golang.org/x/crypto/ssh.(*connection).clientAuthenticate(0xc0003e5e00, 0xc000327860, 0x0, 0xa)
	/go/pkg/mod/golang.org/x/crypto@v0.0.0-20191112222119-e1110fd1c708/ssh/client_auth.go:63 +0x420
golang.org/x/crypto/ssh.(*connection).clientHandshake(0xc0003e5e00, 0xc000488e40, 0x9, 0xc000327860, 0x0, 0x0)
	/go/pkg/mod/golang.org/x/crypto@v0.0.0-20191112222119-e1110fd1c708/ssh/client.go:113 +0x2b6
golang.org/x/crypto/ssh.NewClientConn(0x180c020, 0xc0000e1bb0, 0xc000488e40, 0x9, 0xc000327380, 0x180c020, 0xc0000e1bb0, 0x0, 0x0, 0xc000488e40, ...)
	/go/pkg/mod/golang.org/x/crypto@v0.0.0-20191112222119-e1110fd1c708/ssh/client.go:83 +0xf8
golang.org/x/crypto/ssh.Dial(0x15d5acb, 0x3, 0xc000488e40, 0x9, 0xc000327380, 0xc000488e40, 0x9, 0x1)
	/go/pkg/mod/golang.org/x/crypto@v0.0.0-20191112222119-e1110fd1c708/ssh/client.go:177 +0xb3
github.com/ApsaraDB/PolarDB-Stack-Daemon/polar-controller-manager.SSHConnect(0x15d6479, 0x4, 0xc000488dc8, 0x6, 0x16, 0x2, 0x2, 0xc00003ce00)
	/go/src/github.com/ApsaraDB/PolarDB-Stack-Daemon/polar-controller-manager/sshutil.go:74 +0x26a
github.com/ApsaraDB/PolarDB-Stack-Daemon/polar-controller-manager.(*SSHConnection).Init(0xc0005cb7a0, 0x4, 0xc000488dc8)
	/go/src/github.com/ApsaraDB/PolarDB-Stack-Daemon/polar-controller-manager/sshutil.go:119 +0x17b
github.com/ApsaraDB/PolarDB-Stack-Daemon/polar-controller-manager/node_net_status.(*PolarNodeNetworkProbe).__initSSH(0xc000529f80, 0x1b, 0xc0001184e0)
	/go/src/github.com/ApsaraDB/PolarDB-Stack-Daemon/polar-controller-manager/node_net_status/node_network_probe.go:547 +0x36f
github.com/ApsaraDB/PolarDB-Stack-Daemon/polar-controller-manager/node_net_status.(*PolarNodeNetworkProbe).Init(0xc000529f80, 0x0, 0x0)
	/go/src/github.com/ApsaraDB/PolarDB-Stack-Daemon/polar-controller-manager/node_net_status/node_network_probe.go:164 +0xc5
github.com/ApsaraDB/PolarDB-Stack-Daemon/polar-controller-manager/node_net_status.StartNodeNetworkProbe(0xc00042e000, 0xc000048540)
	/go/src/github.com/ApsaraDB/PolarDB-Stack-Daemon/polar-controller-manager/node_net_status/node_network_probe.go:116 +0x208
created by github.com/ApsaraDB/PolarDB-Stack-Daemon/cmd/daemon/app.Run
	/go/src/github.com/ApsaraDB/PolarDB-Stack-Daemon/cmd/daemon/app/contorllermanager.go:97 +0x1ae

k8s: v1.23, 3 machines
docker: v20.10.12
mysql: v8.0.26

Here is imformation of pods:

NAME                                       READY   STATUS             RESTARTS       AGE
calico-kube-controllers-85b5b5888d-rcpmx   1/1     Running            2 (6d3h ago)   10d
calico-node-9dcsb                          1/1     Running            0              10d
calico-node-knnwv                          1/1     Running            0              10d
calico-node-wgf4h                          1/1     Running            2 (6d3h ago)   10d
coredns-64897985d-tphjz                    1/1     Running            2 (6d3h ago)   10d
coredns-64897985d-vq2cq                    1/1     Running            2 (6d3h ago)   10d
etcd-vm08-1                                1/1     Running            7 (6d3h ago)   10d
kube-apiserver-vm08-1                      1/1     Running            7 (6d3h ago)   10d
kube-controller-manager-vm08-1             1/1     Running            3 (6d3h ago)   10d
kube-proxy-ctc85                           1/1     Running            0              10d
kube-proxy-gzpxg                           1/1     Running            2 (6d3h ago)   10d
kube-proxy-vdxmm                           1/1     Running            0              10d
kube-scheduler-vm08-1                      1/1     Running            3 (6d3h ago)   10d
manager-65dcc96d8d-49d4z                   1/1     Running            0              6m44s
manager-65dcc96d8d-6r6ql                   1/1     Running            0              6m44s
manager-65dcc96d8d-l9rvp                   1/1     Running            0              6m44s
polardb-sms-manager-66db8bbcbf-4dr7q       1/1     Running            0              6m44s
polardb-sms-manager-66db8bbcbf-6mhpc       1/1     Running            0              6m44s
polardb-sms-manager-66db8bbcbf-qzvwf       1/1     Running            0              6m44s
polarstack-daemon-2fpcg                    0/1     CrashLoopBackOff   6 (48s ago)    6m44s
polarstack-daemon-knpxs                    0/1     CrashLoopBackOff   6 (47s ago)    6m44s
polarstack-daemon-mthf7                    0/1     CrashLoopBackOff   6 (59s ago)    6m44s

Here is imformation of cm:

NAME                                                              DATA   AGE
calico-config                                                     4      10d
ccm-config                                                        6      24m
cloud-provider-port-usage-vm08-1                                  0      2d5h
cloud-provider-port-usage-vm08-2                                  0      2d5h
cloud-provider-port-usage-vm08-3                                  0      2d5h
cloud-provider-wwid-usage-vm08-2                                  0      4h51m
cloud-provider-wwid-usage-vm08-3                                  0      4h51m
controller-config                                                 27     24m
coredns                                                           1      10d
extension-apiserver-authentication                                6      10d
instance-system-resources                                         3      24m
kube-proxy                                                        2      10d
kube-root-ca.crt                                                  1      10d
kubeadm-config                                                    1      10d
kubelet-config-1.23                                               1      10d
metabase-config                                                   1      24m
mpd.polardb.aliyun.com                                            0      6d2h
polardb-sms-manager                                               1      24m
polardb4mpd-controller                                            5      24m
polarstack-daemon-version-availability-vm08-1                     2      2d5h
polarstack-daemon-version-availability-vm08-2                     2      2d5h
polarstack-daemon-version-availability-vm08-3                     2      2d5h
postgresql-1-0-level-polar-o-x4-large-config-rwo                  17     24m
postgresql-1-0-level-polar-o-x4-large-resource-rwo                12     24m
postgresql-1-0-level-polar-o-x4-medium-config-rwo                 17     24m
postgresql-1-0-level-polar-o-x4-medium-resource-rwo               12     24m
postgresql-1-0-level-polar-o-x4-xlarge-config-rwo                 17     24m
postgresql-1-0-level-polar-o-x4-xlarge-resource-rwo               12     24m
postgresql-1-0-level-polar-o-x8-12xlarge-config-rwo               17     24m
postgresql-1-0-level-polar-o-x8-12xlarge-exclusive-config-rwo     17     24m
postgresql-1-0-level-polar-o-x8-12xlarge-exclusive-resource-rwo   13     24m
postgresql-1-0-level-polar-o-x8-12xlarge-resource-rwo             14     24m
postgresql-1-0-level-polar-o-x8-2xlarge-config-rwo                17     24m
postgresql-1-0-level-polar-o-x8-2xlarge-exclusive-config-rwo      17     24m
postgresql-1-0-level-polar-o-x8-2xlarge-exclusive-resource-rwo    13     24m
postgresql-1-0-level-polar-o-x8-2xlarge-resource-rwo              12     24m
postgresql-1-0-level-polar-o-x8-4xlarge-config-rwo                17     24m
postgresql-1-0-level-polar-o-x8-4xlarge-exclusive-config-rwo      17     24m
postgresql-1-0-level-polar-o-x8-4xlarge-exclusive-resource-rwo    13     24m
postgresql-1-0-level-polar-o-x8-4xlarge-resource-rwo              12     24m
postgresql-1-0-level-polar-o-x8-xlarge-config-rwo                 17     24m
postgresql-1-0-level-polar-o-x8-xlarge-resource-rwo               12     24m
postgresql-1-0-minor-version-info-rwo-image-open                  6      24m
postgresql-1-0-mycnf-template-rwo                                 1      24m

please make sure that all the hosts can be logined by ssh without password. such as ssh host1(without needing to input password of root)

https://github.com/ApsaraDB/PolarDB-Stack-Operator/blob/master/docs/install_quick.md
每台机器互相之间已经配置好免密 ssh, 每台主机都可在当前root用不下通过ssh hostx的方式登录自己, 例如在主机1上, ssh host1, 在主机2上ssh host2, 以此类推

I have solved it now. And I find another reason: PublicKeyFilePath is hard coded in PolarDB-Stack-Daemon's sshtuil.go. The path happends to be /root/.ssh/id_ed25519 in my machine instead of /root/.ssh/id_rsa, which cause this problem.😑