microsoft/SDN

Errors on start-kubelet.ps1 and start-kubeproxy.ps1 on Windows Server 2019

Vacant0mens opened this issue · 11 comments

The node shows up in the master as Ready but Roles is <none> and the following error/s show/s up when the last line of start-kubelet.ps1 is run.
(This is being run on Windows Server 2019 on Hyper-V.)

Detected kubelet version 1.15.3
Flag --resolv-conf has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --enable-debugging-handlers has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --cluster-dns has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --cluster-domain has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --hairpin-mode has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --cgroups-per-qos has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --enforce-node-allocatable has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
E0828 15:41:57.780723    7176 server.go:725] Kubelet needs to run as uid `0`. It is being run as -1
E0828 15:41:57.810709    7176 aws_credentials.go:77] while getting AWS credentials NoCredentialProviders: no valid providers in chain. Deprecated.
        For verbose messaging see aws.Config.CredentialsChainVerboseErrors
E0828 15:41:57.816711    7176 processstarttime.go:41] Could not get process start time, could not read /proc: CreateFile /proc: The system cannot find the file specified.
E0828 15:41:57.817711    7176 processstarttime.go:41] Could not get process start time, could not read /proc: CreateFile /proc: The system cannot find the file specified.
E0828 15:41:57.843750    7176 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "e576ad413c82f73d60f69972a4d8a9d45b31bd5a6c8737c4f069be698402db3e": checkpoint is not found
E0828 15:41:58.486261    7176 remote_runtime.go:105] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-flannel-ds-amd64-tvrmj": Error response from daemon: network host not found
E0828 15:47:55.961506    7176 kuberuntime_sandbox.go:68] CreatePodSandbox for pod "kube-flannel-ds-amd64-tvrmj_kube-system(1c25a5f3-9dbf-4aec-96e4-cc86985f0ab8)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-flannel-ds-amd64-tvrmj": Error response from daemon: network host not found
E0828 15:47:55.965536    7176 kuberuntime_manager.go:692] createPodSandbox for pod "kube-flannel-ds-amd64-tvrmj_kube-system(1c25a5f3-9dbf-4aec-96e4-cc86985f0ab8)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-flannel-ds-amd64-tvrmj": Error response from daemon: network host not found
E0828 15:47:55.978506    7176 pod_workers.go:190] Error syncing pod 1c25a5f3-9dbf-4aec-96e4-cc86985f0ab8 ("kube-flannel-ds-amd64-tvrmj_kube-system(1c25a5f3-9dbf-4aec-96e4-cc86985f0ab8)"), skipping: failed to "CreatePodSandbox" for "kube-flannel-ds-amd64-tvrmj_kube-system(1c25a5f3-9dbf-4aec-96e4-cc86985f0ab8)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-flannel-ds-amd64-tvrmj_kube-system(1c25a5f3-9dbf-4aec-96e4-cc86985f0ab8)\" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod \"kube-flannel-ds-amd64-tvrmj\": Error response from daemon: network host not found"

it just keeps looping after that putting out the same couple messages.

Side note:
Is there a way someone could inject these settings into the config instead of having all those Flag * has been deprecated messages? that's pretty annoying.

also found this:

PS C:\k> docker container list -a
CONTAINER ID        IMAGE                                    COMMAND                  CREATED             STATUS              PORTS               NAMES
76624e1c376a        mcr.microsoft.com/k8s/core/pause:1.0.0   "cmd /S /C 'cmd /c p…"   20 minutes ago      Created                                 k8s_POD_kube-flannel-ds-amd64-tvrmj_kube-system_1c25a5f3-9dbf-4aec-96e4-cc86985f0ab8_246
PS C:\k> docker start 766
Error response from daemon: network host not found
Error: failed to start containers: 766
docker container list -a
CONTAINER ID        IMAGE                                    COMMAND                  CREATED             STATUS              PORTS               NAMES
8bf694e01fd7        mcr.microsoft.com/k8s/core/pause:1.0.0   "cmd /S /C 'cmd /c p…"   38 seconds ago      Created                                 k8s_POD_kube-flannel-ds-amd64-tvrmj_kube-system_1c25a5f3-9dbf-4aec-96e4-cc86985f0ab8_6
docker container inspect 8bf
[
    {
        "Id": "8bf694e01fd7ebced0999db0a6d837442bc09e8431aa172b7f208ace6b0c1c26",
...
        "HostConfig": {
            "Binds": null,
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "json-file",
                "Config": {}
            },
            "NetworkMode": "host",
...

According to this issue, network: host isn't supported in windows. it should be updated to be l2bridge. It seems that this is the only place that's not hard-coded or defaulted to l2bridge.

I think we determined on slack that this was due to the kube-flannel daemonset not having a nodeSelector on it to keep it from running on Windows

@Vacant0mens is this using l2bridge on AWS?

it was l2bridge (which is hardcoded in a lot of the scripts), on Hyper-V VM's.

Oh OK. I wasn't sure why there was some AWS credential error in the output logs. Which docs are being followed to deploy this and which start-kubelet.ps1?

But yeah if this issue is about Flannel not starting then +1 to what Ben moss is saying.

I tried to use the kubernetes documentation on the setup, but it seemed to be out of date, so I tried the Microsoft documentation for the setup, but it kind of seems like the SDN scripts aren't fully baked yet.

Similar problem comes to me.

Log file created at: 2019/10/14 16:11:18
Running on machine: winserver-en
Binary: Built with gc go1.12.5 for windows/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E1014 16:11:18.953103 1500 server.go:725] Kubelet needs to run as uid 0. It is being run as -1
E1014 16:11:19.088628 1500 aws_credentials.go:77] while getting AWS credentials NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
E1014 16:11:19.133623 1500 processstarttime.go:41] Could not get process start time, could not read /proc: CreateFile /proc: The system cannot find the file specified.
E1014 16:11:19.135622 1500 processstarttime.go:41] Could not get process start time, could not read /proc: CreateFile /proc: The system cannot find the file specified.
E1014 16:11:23.399438 1500 docker_sandbox.go:700] ResolvConfPath is empty.

These annoying noise in the logs should be resolved in the new Kubernetes docs, can you try it out? https://kubernetes.io/docs/setup/production-environment/windows/user-guide-windows-nodes/

me too got same errors but node joined to cluster and it is in not ready state, seems like kubelet is not running.any suggestions to make kubelet run on windows node.
note: i don't have any CNI plugin in windows node and remaining linux m/c are using calico as CNI plugin.

Having the exact same issues, and I've run the node selector patch. Any suggestion?