NP-646: MicroShift should not cause the host IP to change on startup

Question

NP-646: MicroShift should not cause the host IP to change on startup

adelton opened this issue 2 years ago · 22 comments

What happened?

I run the steps at https://microshift.io/docs/getting-started/.

What did you expect to happen?

I expected oc get pods -A and oc get nodes to show some pods and nodes. Instead they both report No resources found.

How to reproduce it (as minimally and precisely as possible)?

Have a fresh Fedora 36 machine with just @core group installed (I used one in beaker).
# dnf module enable -y cri-o:1.21 ; dnf install -y cri-o cri-tools
# systemctl enable crio --now
# dnf copr enable -y @redhat-et/microshift
# dnf install -y microshift
I skipped the firewalld steps here because firewalld was not running on my system.
# systemctl enable microshift --now
# curl -O https://mirror.openshift.com/pub/openshift-v4/$(uname -m)/clients/ocp/stable/openshift-client-linux.tar.gz
# tar -xf openshift-client-linux.tar.gz -C /usr/local/bin oc kubectl
# mkdir ~/.kube ; ln -s /var/lib/microshift/resources/kubeadmin/kubeconfig ~/.kube/config
# oc get pods -A

Anything else we need to know?

systemctl status microshift shows

● microshift.service - MicroShift
     Loaded: loaded (/usr/lib/systemd/system/microshift.service; enabled; vendor preset: disabled)
     Active: active (running) since Fri 2022-10-28 16:45:18 CEST; 36s ago
   Main PID: 3708 (microshift)
      Tasks: 9 (limit: 3451)
     Memory: 428.9M
        CPU: 9.181s
     CGroup: /system.slice/microshift.service
             └─ 3708 microshift run

Oct 28 16:45:51 machine.example.com microshift[3708]: E1028 16:45:51.958733    3708 available_controller.go:508] v1.apps.openshift.io failed with: failing or missing resp>
Oct 28 16:45:51 machine.example.com microshift[3708]: E1028 16:45:51.961777    3708 available_controller.go:508] v1.project.openshift.io failed with: failing or missing r>
Oct 28 16:45:51 machine.example.com microshift[3708]: E1028 16:45:51.961780    3708 available_controller.go:508] v1.build.openshift.io failed with: failing or missing res>
Oct 28 16:45:51 machine.example.com microshift[3708]: E1028 16:45:51.962059    3708 available_controller.go:508] v1.template.openshift.io failed with: failing or missing >
Oct 28 16:45:51 machine.example.com microshift[3708]: E1028 16:45:51.962105    3708 available_controller.go:508] v1.route.openshift.io failed with: failing or missing res>
Oct 28 16:45:51 machine.example.com microshift[3708]: E1028 16:45:51.962173    3708 available_controller.go:508] v1.image.openshift.io failed with: failing or missing res>
Oct 28 16:45:52 machine.example.com microshift[3708]: E1028 16:45:52.074561    3708 available_controller.go:508] v1.user.openshift.io failed with: failing or missing resp>
Oct 28 16:45:53 machine.example.com microshift[3708]: E1028 16:45:53.823184    3708 reflector.go:138] github.com/openshift/client-go/image/informers/externalversions/fact>
Oct 28 16:45:54 machine.example.com microshift[3708]: I1028 16:45:54.214773    3708 crd.go:164] Applied openshift CRD assets/crd/0000_10_config-operator_01_image.crd.yaml
Oct 28 16:45:54 machine.example.com microshift[3708]: I1028 16:45:54.214785    3708 crd.go:153] Applying openshift CRD assets/crd/0000_03_config-operator_01_proxy.crd.yaml

Assuming the clues are in some previous error journal entries with "E*" designation, the first microshift one is

Oct 28 16:31:25 machine.example.com microshift[2632]: E1028 16:31:25.046613    2632 controller.go:152] Unable to remove old endpoints from kubernetes service: StorageError: key not found, Code: 1, Key: /registry/masterleases/10.43.140.11, ResourceVersion: 0, AdditionalErrorMsg:

and then

Oct 28 16:31:27 machine.example.com microshift[2632]: E1028 16:31:27.239775    2632 reflector.go:138] github.com/openshift/openshift-controller-manager/pkg/unidling/controller/unidling_controller.go:221: Failed to watch *v1.Event: failed to list *v1.Event: events is forbidden: User "system:serviceaccount:openshift-infra:unidling-controller" cannot list resource "events" in API group "" at the cluster scope

and then a stream of

Oct 28 16:31:27 machine.example.com microshift[2632]: E1028 16:31:27.288753    2632 reflector.go:138] github.com/openshift/client-go/operator/informers/externalversions/factory.go:101: Failed to watch *v1alpha1.ImageContentSourcePolicy: failed to list *v1alpha1.ImageContentSourcePolicy: the server could not find the requested resource (get imagecontentsourcepolicies.operator.openshift.io)
Oct 28 16:31:27 machine.example.com microshift[2632]: E1028 16:31:27.309168    2632 reflector.go:138] github.com/openshift/client-go/apps/informers/externalversions/factory.go:101: Failed to watch *v1.DeploymentConfig: failed to list *v1.DeploymentConfig: the server could not find the requested resource (get deploymentconfigs.apps.openshift.io)
Oct 28 16:31:27 machine.example.com microshift[2632]: E1028 16:31:27.309193    2632 reflector.go:138] github.com/openshift/client-go/build/informers/externalversions/factory.go:101: Failed to watch *v1.Build: failed to list *v1.Build: the server could not find the requested resource (get builds.build.openshift.io)
Oct 28 16:31:27 machine.example.com microshift[2632]: E1028 16:31:27.309211    2632 reflector.go:138] github.com/openshift/client-go/build/informers/externalversions/factory.go:101: Failed to watch *v1.BuildConfig: failed to list *v1.BuildConfig: the server could not find the requested resource (get buildconfigs.build.openshift.io)
Oct 28 16:31:27 machine.example.com microshift[2632]: E1028 16:31:27.309228    2632 reflector.go:138] github.com/openshift/client-go/config/informers/externalversions/factory.go:101: Failed to watch *v1.Build: failed to list *v1.Build: the server could not find the requested resource (get builds.config.openshift.io)
Oct 28 16:31:27 machine.example.com microshift[2632]: E1028 16:31:27.309244    2632 reflector.go:138] github.com/openshift/client-go/config/informers/externalversions/factory.go:101: Failed to watch *v1.Proxy: failed to list *v1.Proxy: the server could not find the requested resource (get proxies.config.openshift.io)
Oct 28 16:31:27 machine.example.com microshift[2632]: E1028 16:31:27.309260    2632 reflector.go:138] github.com/openshift/client-go/config/informers/externalversions/factory.go:101: Failed to watch *v1.Image: failed to list *v1.Image: the server could not find the requested resource (get images.config.openshift.io)
Oct 28 16:31:27 machine.example.com microshift[2632]: E1028 16:31:27.309275    2632 reflector.go:138] github.com/openshift/client-go/image/informers/externalversions/factory.go:101: Failed to watch *v1.ImageStream: failed to list *v1.ImageStream: the server could not find the requested resource (get imagestreams.image.openshift.io)
Oct 28 16:31:27 machine.example.com microshift[2632]: E1028 16:31:27.309290    2632 reflector.go:138] github.com/openshift/client-go/image/informers/externalversions/factory.go:101: Failed to watch *v1.Image: failed to list *v1.Image: the server could not find the requested resource (get images.image.openshift.io)
Oct 28 16:31:27 machine.example.com microshift[2632]: E1028 16:31:27.309307    2632 reflector.go:138] github.com/openshift/client-go/template/informers/externalversions/factory.go:101: Failed to watch *v1.TemplateInstance: failed to list *v1.TemplateInstance: the server could not find the requested resource (get templateinstances.template.openshift.io)
Oct 28 16:31:27 machine.example.com microshift[2632]: E1028 16:31:27.309322    2632 reflector.go:138] github.com/openshift/client-go/route/informers/externalversions/factory.go:101: Failed to watch *v1.Route: failed to list *v1.Route: the server could not find the requested resource (get routes.route.openshift.io)

Environment

MicroShift version (use microshift version):

MicroShift Version: 4.8.0-0.microshift-2022-04-20-141053
Base OKD Version: 4.8.0-0.okd-2021-10-10-030117

Hardware configuration:

A KVM VM.

OS (e.g: cat /etc/os-release):

NAME="Fedora Linux"
VERSION="36 (Thirty Six)"
ID=fedora
VERSION_ID=36
VERSION_CODENAME=""
PLATFORM_ID="platform:f36"
PRETTY_NAME="Fedora Linux 36 (Thirty Six)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:36"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f36/system-administrators-guide/"
SUPPORT_URL="https://ask.fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=36
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=36
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"

Kernel (e.g. uname -a):

Linux machine.example.com 5.19.16-200.fc36.x86_64 #1 SMP PREEMPT_DYNAMIC Sun Oct 16 22:50:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Others:

Relevant logs

zshi-redhat commented 2 years ago

/close

Answer 1 · 2022-10-31T07:16:35.000Z

@adelton, the microshift.io site contains references to the old code.
Is there a reason you cannot try the instructions at https://github.com/openshift/microshift?

Answer 2 · 2022-10-31T11:35:08.000Z

@ggiguash Do you have https://github.com/openshift/microshift/blob/main/docs/getting_started.md in mind? That seems to focus on running the MicroShift as a VM via virt-install and a kickstart, rather than deploying on existing RHEL or Fedora machine via rpm/dnf package installations. I don't like being forced to these types of VM installations, one reason being that they are hard to automate with beaker because I won't have the harness on that VM.

Is there a getting-started document at https://github.com/openshift/microshift which describes installation of configuration of MicroShift using standard have machine + enable repo(s) + install packages + do some configuration and run services workflow, similar to https://microshift.io/docs/getting-started/?

Answer 3 · 2022-10-31T12:08:25.000Z

Is there a getting-started document at https://github.com/openshift/microshift which describes installation of configuration of MicroShift using standard have machine + enable repo(s) + install packages + do some configuration and run services workflow, similar to https://microshift.io/docs/getting-started/?

Yes, see this page for detailed description on how to configure a devenv

Answer 4 · 2022-10-31T13:12:45.000Z

My goal is to consume rpm-built MicroShift, on a given RHEL or CentOS or Fedora machine, not really build from sources.

So I tried the steps from https://raw.githubusercontent.com/openshift/microshift/main/docs/config/microshift-starter.ks, basically using RHEL 8.6 and running

# CENTOS8BASE=http://mirror.centos.org/centos/8-stream/BaseOS/x86_64/os/Packages
# curl -LO -s $CENTOS8BASE/selinux-policy-3.14.3-96.el8.noarch.rpm
# curl -LO -s $CENTOS8BASE/selinux-policy-devel-3.14.3-96.el8.noarch.rpm
# curl -LO -s $CENTOS8BASE/selinux-policy-targeted-3.14.3-96.el8.noarch.rpm
# dnf localinstall -y selinux-policy*.rpm

# dnf copr enable -y @redhat-et/microshift-testing
# dnf install -y microshift
# systemctl enable microshift --now

The terminal (ssh) gets stuck eventually. The journalctl -fl ends with

Oct 31 14:04:51 machine.example.com microshift[17919]: kubelet E1031 14:04:51.873013   17919 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"dns-node-resolver\" with ErrImagePull: \"rpc error: code = Unknown desc = reading manifest sha256:4d182d11a30e6c3c1420502bec5b1192c43c32977060c4def96ea160172f71e7 in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized\"" pod="openshift-dns/node-resolver-45796" podUID=63e75c1a-9689-45db-b646-6eea0a58ed25
Oct 31 14:04:51 machine.example.com microshift[17919]: kubelet E1031 14:04:51.874355   17919 pod_workers.go:951] "Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/cni/net.d/. Has your network provider started?" pod="openshift-dns/dns-default-cnq7k" podUID=e66a2aa8-c940-46ce-8ab7-ddbb92310491
Oct 31 14:04:51 machine.example.com ovs-vsctl[18532]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=15 set Open_vSwitch . "external_ids:ovn-remote=\"unix:/var/run/ovn/ovnsb_db.sock\""
Oct 31 14:04:51 machine.example.com ovs-vsctl[18533]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=15 set Open_vSwitch . external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=10.43.140.16 external_ids:ovn-remote-probe-interval=180000 external_ids:ovn-openflow-probe-interval=180 "external_ids:hostname=\"machine.example.com\"" external_ids:ovn-monitor-all=true external_ids:ovn-ofctrl-wait-before-clear=0 external_ids:ovn-enable-lflow-cache=false external_ids:ovn-memlimit-lflow-cache-kb=870
Oct 31 14:04:51 machine.example.com ovs-vsctl[18534]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=15 -- clear bridge br-int netflow -- clear bridge br-int sflow -- clear bridge br-int ipfix
Oct 31 14:04:51 machine.example.com ovs-vsctl[18536]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=15 -- --if-exists del-port br-int k8s-machine.example -- --may-exist add-port br-int ovn-k8s-mp0 -- set interface ovn-k8s-mp0 type=internal mtu_request=1400 external-ids:iface-id=k8s-machine.example.com
Oct 31 14:04:51 machine.example.com NetworkManager[15509]: <info>  [1667221491.9829] manager: (ovn-k8s-mp0): new Open vSwitch Interface device (/org/freedesktop/NetworkManager/Devices/10)
Oct 31 14:04:51 machine.example.com NetworkManager[15509]: <info>  [1667221491.9832] device (ovn-k8s-mp0): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')
Oct 31 14:04:51 machine.example.com NetworkManager[15509]: <info>  [1667221491.9835] manager: (ovn-k8s-mp0): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/11)
Oct 31 14:04:51 machine.example.com NetworkManager[15509]: <info>  [1667221491.9837] device (ovn-k8s-mp0): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'managed')
Oct 31 14:04:51 machine.example.com kernel: device ovn-k8s-mp0 entered promiscuous mode
Oct 31 14:04:51 machine.example.com systemd-udevd[18539]: Using default interface naming scheme 'rhel-8.0'.
Oct 31 14:04:51 machine.example.com systemd-udevd[18539]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Oct 31 14:04:51 machine.example.com systemd-udevd[18539]: Could not generate persistent MAC address for ovn-k8s-mp0: No such file or directory
Oct 31 14:04:51 machine.example.com ovs-vsctl[18543]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=15 set interface ovn-k8s-mp0 "mac=6e\\:29\\:33\\:8c\\:01\\:d4"
Oct 31 14:04:52 machine.example.com NetworkManager[15509]: <info>  [1667221492.0000] device (ovn-k8s-mp0): carrier: link connected
Oct 31 14:04:52 machine.example.com ovs-vsctl[18565]: ovs|00001|db_ctl_base|ERR|no port named br-ex
Oct 31 14:04:52 machine.example.com ovs-vsctl[18573]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=15 set Open_vSwitch . external_ids:ovn-bridge-mappings=physnet:br-ex
Oct 31 14:04:52 machine.example.com NetworkManager[15509]: <info>  [1667221492.4837] manager: (patch-br-int-to-br-ex_machine.example.com): new Open vSwitch Interface device (/org/freedesktop/NetworkManager/Devices/12)
Oct 31 14:04:52 machine.example.com NetworkManager[15509]: <info>  [1667221492.4839] device (patch-br-int-to-br-ex_machine.example.com): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')
Oct 31 14:04:52 machine.example.com NetworkManager[15509]: <info>  [1667221492.4841] manager: (patch-br-ex_machine.example.com-to-br-int): new Open vSwitch Interface device (/org/freedesktop/NetworkManager/Devices/13)
Oct 31 14:04:52 machine.example.com NetworkManager[15509]: <info>  [1667221492.4842] device (patch-br-ex_machine.example.com-to-br-int): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')
Oct 31 14:04:52 machine.example.com NetworkManager[15509]: <info>  [1667221492.4845] manager: (patch-br-int-to-br-ex_machine.example.com): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/14)
Oct 31 14:04:52 machine.example.com NetworkManager[15509]: <info>  [1667221492.4846] manager: (patch-br-ex_machine.example.com-to-br-int): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/15)
Oct 31 14:04:52 machine.example.com NetworkManager[15509]: <info>  [1667221492.4848] device (patch-br-int-to-br-ex_machine.example.com): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'managed')
Oct 31 14:04:52 machine.example.com NetworkManager[15509]: <info>  [1667221492.4849] device (patch-br-ex_machine.example.com-to-br-int): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'managed')

So it seems like starting the microshift service from @redhat-et/microshift-testing messes the networking on the machine.

Answer 5 · 2022-10-31T13:37:18.000Z

The very first message in the output:

Oct 31 14:04:51 [machine.example.com](http://machine.example.com/) microshift[17919]: kubelet E1031 14:04:51.873013   17919 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"dns-node-resolver\" with ErrImagePull: \"rpc error: code = Unknown desc = reading manifest sha256:4d182d11a30e6c3c1420502bec5b1192c43c32977060c4def96ea160172f71e7 in [quay.io/openshift-release-dev/ocp-v4.0-art-dev](http://quay.io/openshift-release-dev/ocp-v4.0-art-dev): unauthorized: access to the requested resource is not authorized\"" pod="openshift-dns/node-resolver-45796" podUID=63e75c1a-9689-45db-b646-6eea0a58ed25

Looks like a problem with the pull secret.

Answer 6 · 2022-10-31T13:51:55.000Z

Putting the pull secret both to ~/.pull-secret.json and /etc/crio/openshift-pull-secret seems to make that specific error message go away but the networking still gets reconfigured in such a way that the machine is no longer accessible via ssh on its original IP address.

Answer 7 · 2022-10-31T14:06:46.000Z

@adelton, can you post the latest logs, having fixed the pull secret issue?

Answer 8 · 2022-11-02T09:56:40.000Z

@adelton Could you also share the log from microshift-ovs-init systemd service and the output of the following cmds on the microshift node:

ip link show
ip addr show
ovs-vsctl show

The microshift-ovs-init service sets up an OVS bridge br-ex on the node interface, It flushes the IP of the node interface and regain the IP on br-ex bridge. The ssh disconnection might be caused by this network change.

Answer 9 · 2022-11-02T14:49:54.000Z

The logs end with

Nov 02 15:47:20 machine.example.com systemd[1]: Started crio-conmon-5a98ec5d1c8971314fdd8e48cc9c6e240f43214be60655ccd9e6d679d30e03ee.scope.
Nov 02 15:47:20 machine.example.com systemd[1]: Started libcontainer container 5a98ec5d1c8971314fdd8e48cc9c6e240f43214be60655ccd9e6d679d30e03ee.
Nov 02 15:47:20 machine.example.com crio[17749]: time="2022-11-02 15:47:20.270924829+01:00" level=info msg="Created container 5a98ec5d1c8971314fdd8e48cc9c6e240f43214be60655ccd9e6d679d30e03ee: openshift-ovn-kubernetes/ovnkube-master-hbwrh/ovnkube-master" id=0bdb829e-1e18-41e9-8d24-fe5adb956706 name=/runtime.v1.RuntimeService/CreateContainer
Nov 02 15:47:20 machine.example.com crio[17749]: time="2022-11-02 15:47:20.271211225+01:00" level=info msg="Starting container: 5a98ec5d1c8971314fdd8e48cc9c6e240f43214be60655ccd9e6d679d30e03ee" id=da177da8-c72b-4306-87df-5822489a736f name=/runtime.v1.RuntimeService/StartContainer
Nov 02 15:47:20 machine.example.com crio[17749]: time="2022-11-02 15:47:20.277347279+01:00" level=info msg="Started container" PID=18543 containerID=5a98ec5d1c8971314fdd8e48cc9c6e240f43214be60655ccd9e6d679d30e03ee description=openshift-ovn-kubernetes/ovnkube-master-hbwrh/ovnkube-master id=da177da8-c72b-4306-87df-5822489a736f name=/runtime.v1.RuntimeService/StartContainer sandboxID=8ccd5dc1892194101430579909aa96b88c656f338217a3f037b7caa39596d08f
Nov 02 15:47:20 machine.example.com crio[17749]: time="2022-11-02 15:47:20.282084230+01:00" level=info msg="CNI monitoring event \"/opt/cni/bin/ovn-k8s-cni-overlay\": CREATE"
Nov 02 15:47:20 machine.example.com crio[17749]: time="2022-11-02 15:47:20.287254605+01:00" level=info msg="Found CNI network crio (type=bridge) at /etc/cni/net.d/100-crio-bridge.conf"
Nov 02 15:47:20 machine.example.com crio[17749]: time="2022-11-02 15:47:20.290184357+01:00" level=info msg="Found CNI network 200-loopback.conf (type=loopback) at /etc/cni/net.d/200-loopback.conf"
Nov 02 15:47:20 machine.example.com crio[17749]: time="2022-11-02 15:47:20.290200823+01:00" level=info msg="CNI monitoring event \"/opt/cni/bin/ovn-k8s-cni-overlay\": WRITE"
Nov 02 15:47:20 machine.example.com crio[17749]: time="2022-11-02 15:47:20.291713541+01:00" level=info msg="Found CNI network crio (type=bridge) at /etc/cni/net.d/100-crio-bridge.conf"
Nov 02 15:47:20 machine.example.com crio[17749]: time="2022-11-02 15:47:20.292704807+01:00" level=info msg="Found CNI network 200-loopback.conf (type=loopback) at /etc/cni/net.d/200-loopback.conf"
Nov 02 15:47:20 machine.example.com crio[17749]: time="2022-11-02 15:47:20.292716311+01:00" level=info msg="CNI monitoring event \"/opt/cni/bin/ovn-k8s-cni-overlay\": WRITE"
Nov 02 15:47:20 machine.example.com crio[17749]: time="2022-11-02 15:47:20.294226316+01:00" level=info msg="Found CNI network crio (type=bridge) at /etc/cni/net.d/100-crio-bridge.conf"
Nov 02 15:47:20 machine.example.com crio[17749]: time="2022-11-02 15:47:20.295363804+01:00" level=info msg="Found CNI network 200-loopback.conf (type=loopback) at /etc/cni/net.d/200-loopback.conf"
Nov 02 15:47:20 machine.example.com microshift[17970]: kubelet E1102 15:47:20.458263   17970 pod_workers.go:951] "Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/cni/net.d/. Has your network provider started?" pod="openshift-dns/dns-default-27mhp" podUID=e18e74e8-a5fb-485d-9cb6-a22b87049af2
Nov 02 15:47:20 machine.example.com ovs-vsctl[18654]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=15 set Open_vSwitch . "external_ids:ovn-remote=\"unix:/var/run/ovn/ovnsb_db.sock\""
Nov 02 15:47:20 machine.example.com ovs-vsctl[18655]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=15 set Open_vSwitch . external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=10.43.140.21 external_ids:ovn-remote-probe-interval=180000 external_ids:ovn-openflow-probe-interval=180 "external_ids:hostname=\"machine.example.com\"" external_ids:ovn-monitor-all=true external_ids:ovn-ofctrl-wait-before-clear=0 external_ids:ovn-enable-lflow-cache=false external_ids:ovn-memlimit-lflow-cache-kb=870
Nov 02 15:47:20 machine.example.com ovs-vsctl[18656]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=15 -- clear bridge br-int netflow -- clear bridge br-int sflow -- clear bridge br-int ipfix
Nov 02 15:47:20 machine.example.com ovs-vsctl[18658]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=15 -- --if-exists del-port br-int k8s-machine.example -- --may-exist add-port br-int ovn-k8s-mp0 -- set interface ovn-k8s-mp0 type=internal mtu_request=1400 external-ids:iface-id=k8s-machine.example.com
Nov 02 15:47:20 machine.example.com NetworkManager[15603]: <info>  [1667400440.5332] manager: (ovn-k8s-mp0): new Open vSwitch Interface device (/org/freedesktop/NetworkManager/Devices/10)
Nov 02 15:47:20 machine.example.com NetworkManager[15603]: <info>  [1667400440.5334] device (ovn-k8s-mp0): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')
Nov 02 15:47:20 machine.example.com NetworkManager[15603]: <info>  [1667400440.5337] manager: (ovn-k8s-mp0): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/11)
Nov 02 15:47:20 machine.example.com NetworkManager[15603]: <info>  [1667400440.5338] device (ovn-k8s-mp0): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'managed')
Nov 02 15:47:20 machine.example.com kernel: device ovn-k8s-mp0 entered promiscuous mode
Nov 02 15:47:20 machine.example.com systemd-udevd[18661]: Using default interface naming scheme 'rhel-8.0'.
Nov 02 15:47:20 machine.example.com systemd-udevd[18661]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Nov 02 15:47:20 machine.example.com systemd-udevd[18661]: Could not generate persistent MAC address for ovn-k8s-mp0: No such file or directory
Nov 02 15:47:20 machine.example.com ovs-vsctl[18665]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=15 set interface ovn-k8s-mp0 "mac=d2\\:e6\\:78\\:0c\\:8d\\:2d"
Nov 02 15:47:20 machine.example.com NetworkManager[15603]: <info>  [1667400440.5514] device (ovn-k8s-mp0): carrier: link connected
Nov 02 15:47:20 machine.example.com ovs-vsctl[18687]: ovs|00001|db_ctl_base|ERR|no port named br-ex
Nov 02 15:47:20 machine.example.com ovs-vsctl[18695]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=15 set Open_vSwitch . external_ids:ovn-bridge-mappings=physnet:br-ex
Nov 02 15:47:21 machine.example.com NetworkManager[15603]: <info>  [1667400441.0768] manager: (patch-br-ex_machine.example.com-to-br-int): new Open vSwitch Interface device (/org/freedesktop/NetworkManager/Devices/12)
Nov 02 15:47:21 machine.example.com NetworkManager[15603]: <info>  [1667400441.0770] device (patch-br-ex_machine.example.com-to-br-int): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')
Nov 02 15:47:21 machine.example.com NetworkManager[15603]: <info>  [1667400441.0773] manager: (patch-br-int-to-br-ex_machine.example.com): new Open vSwitch Interface device (/org/freedesktop/NetworkManager/Devices/13)
Nov 02 15:47:21 machine.example.com NetworkManager[15603]: <info>  [1667400441.0773] device (patch-br-int-to-br-ex_machine.example.com): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')
Nov 02 15:47:21 machine.example.com NetworkManager[15603]: <info>  [1667400441.0776] manager: (patch-br-ex_machine.example.com-to-br-int): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/14)
Nov 02 15:47:21 machine.example.com NetworkManager[15603]: <info>  [1667400441.0785] manager: (patch-br-int-to-br-ex_machine.example.com): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/15)
Nov 02 15:47:21 machine.example.com NetworkManager[15603]: <info>  [1667400441.0786] device (patch-br-ex_machine.example.com-to-br-int): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'managed')
Nov 02 15:47:21 machine.example.com NetworkManager[15603]: <info>  [1667400441.0787] device (patch-br-int-to-br-ex_machine.example.com): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'managed')

Answer 10 · 2022-11-02T14:52:03.000Z

The microshift-ovs-init service sets up an OVS bridge br-ex on the node interface, It flushes the IP of the node interface and regain the IP on br-ex bridge. The ssh disconnection might be caused by this network change.

So what is the way of getting connected back to the host machine to actually inspect the microshift-ovs-init service? Because the machine no longer responds on the original IP address, even if I try a new ssh connection ...

Answer 11 · 2022-11-03T02:03:36.000Z

The microshift-ovs-init service sets up an OVS bridge br-ex on the node interface, It flushes the IP of the node interface and regain the IP on br-ex bridge. The ssh disconnection might be caused by this network change.

So what is the way of getting connected back to the host machine to actually inspect the microshift-ovs-init service? Because the machine no longer responds on the original IP address, even if I try a new ssh connection ...

Unfortunately you cannot reconnect via the original IP address if the br-ex cannot regain IP address. Is there any additional host interface or virtual console that can be used to reconnect?

Answer 12 · 2022-11-03T07:34:13.000Z

There is no other physical host interface and for automation purposes, using the console is not possible.

If this behaviour of microshift service and other services it starts (microshift-ovs-init) is expected, shouldn't there be steps described in the installation / setup instructions to show how to preserve access to the machine? For example, you say br-ex cannot regain IP address. Does it mean that it cannot redo the DHCP request? What IP address does it get set anyway? If we captured the DHCP-provided address before starting the installation and configuration and running of the microshift service, is there a way to "force" the same address (as a static one) to the post-br-ex setup?

Answer 13 · 2022-11-03T07:38:40.000Z

@adelton, could you explain if the machine gets a different IP address, or all connectivity is lost?

Answer 14 · 2022-11-07T08:58:17.000Z

I have no way of knowing. The machine is a remote one so my only option to figure out what is going on is to try to connect to it via ssh.

That's why I believe we need a very solid documentation for preserving that initial IP address and possibility of keeping the ssh connectivity.

Answer 15 · 2022-12-05T06:22:04.000Z

/retitle NP-646: MicroShift should not cause the host IP to change on startup

Answer 16 · 2022-12-16T03:50:15.000Z

@adelton do you know how we can reproduce the issue?

Answer 17 · 2022-12-16T18:01:08.000Z

My latest tests with microshift from the @redhat-et/microshift-testing copr repo on RHEL 8.6 no longer have the problem -- I'm able to ssh to that host just fine even after the node is reported as Ready and pods are (mostly) running:

# oc get pods -A
NAMESPACE                  NAME                                  READY   STATUS             RESTARTS       AGE
openshift-dns              dns-default-xc5dw                     2/2     Running            0              12m
openshift-dns              node-resolver-5j8bj                   1/1     Running            0              12m
openshift-ingress          router-default-7c9c47d97f-ld7mc       1/1     Running            0              12m
openshift-ovn-kubernetes   ovnkube-master-fhstc                  4/4     Running            0              12m
openshift-ovn-kubernetes   ovnkube-node-kspq4                    1/1     Running            0              12m
openshift-service-ca       service-ca-66b8869cf9-n48cv           1/1     Running            0              12m
openshift-storage          topolvm-controller-78876c5fcd-kcqj9   4/4     Running            0              12m
openshift-storage          topolvm-node-lj4m4                    2/4     CrashLoopBackOff   14 (56s ago)   12m

Answer 18 · 2023-02-06T10:33:48.000Z

@adelton do you agree to close this issue given it works with your latest tests?

Answer 19 · 2023-02-06T12:34:08.000Z

Sure, if it is clear where/how the change of behaviour happened in the code.

Answer 20 · 2023-02-06T12:53:45.000Z

Sure, if it is clear where/how the change of behaviour happened in the code.

Thanks!

Answer 21 · 2023-02-06T12:54:07.000Z

@zshi-redhat: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.