NP-646: MicroShift should not cause the host IP to change on startup
adelton opened this issue · 22 comments
What happened?
I run the steps at https://microshift.io/docs/getting-started/.
What did you expect to happen?
I expected oc get pods -A
and oc get nodes
to show some pods and nodes. Instead they both report No resources found
.
How to reproduce it (as minimally and precisely as possible)?
- Have a fresh Fedora 36 machine with just @core group installed (I used one in beaker).
# dnf module enable -y cri-o:1.21 ; dnf install -y cri-o cri-tools
# systemctl enable crio --now
# dnf copr enable -y @redhat-et/microshift
# dnf install -y microshift
- I skipped the firewalld steps here because firewalld was not running on my system.
# systemctl enable microshift --now
# curl -O https://mirror.openshift.com/pub/openshift-v4/$(uname -m)/clients/ocp/stable/openshift-client-linux.tar.gz
# tar -xf openshift-client-linux.tar.gz -C /usr/local/bin oc kubectl
# mkdir ~/.kube ; ln -s /var/lib/microshift/resources/kubeadmin/kubeconfig ~/.kube/config
# oc get pods -A
Anything else we need to know?
systemctl status microshift
shows
● microshift.service - MicroShift
Loaded: loaded (/usr/lib/systemd/system/microshift.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2022-10-28 16:45:18 CEST; 36s ago
Main PID: 3708 (microshift)
Tasks: 9 (limit: 3451)
Memory: 428.9M
CPU: 9.181s
CGroup: /system.slice/microshift.service
└─ 3708 microshift run
Oct 28 16:45:51 machine.example.com microshift[3708]: E1028 16:45:51.958733 3708 available_controller.go:508] v1.apps.openshift.io failed with: failing or missing resp>
Oct 28 16:45:51 machine.example.com microshift[3708]: E1028 16:45:51.961777 3708 available_controller.go:508] v1.project.openshift.io failed with: failing or missing r>
Oct 28 16:45:51 machine.example.com microshift[3708]: E1028 16:45:51.961780 3708 available_controller.go:508] v1.build.openshift.io failed with: failing or missing res>
Oct 28 16:45:51 machine.example.com microshift[3708]: E1028 16:45:51.962059 3708 available_controller.go:508] v1.template.openshift.io failed with: failing or missing >
Oct 28 16:45:51 machine.example.com microshift[3708]: E1028 16:45:51.962105 3708 available_controller.go:508] v1.route.openshift.io failed with: failing or missing res>
Oct 28 16:45:51 machine.example.com microshift[3708]: E1028 16:45:51.962173 3708 available_controller.go:508] v1.image.openshift.io failed with: failing or missing res>
Oct 28 16:45:52 machine.example.com microshift[3708]: E1028 16:45:52.074561 3708 available_controller.go:508] v1.user.openshift.io failed with: failing or missing resp>
Oct 28 16:45:53 machine.example.com microshift[3708]: E1028 16:45:53.823184 3708 reflector.go:138] github.com/openshift/client-go/image/informers/externalversions/fact>
Oct 28 16:45:54 machine.example.com microshift[3708]: I1028 16:45:54.214773 3708 crd.go:164] Applied openshift CRD assets/crd/0000_10_config-operator_01_image.crd.yaml
Oct 28 16:45:54 machine.example.com microshift[3708]: I1028 16:45:54.214785 3708 crd.go:153] Applying openshift CRD assets/crd/0000_03_config-operator_01_proxy.crd.yaml
Assuming the clues are in some previous error journal entries with "E*" designation, the first microshift one is
Oct 28 16:31:25 machine.example.com microshift[2632]: E1028 16:31:25.046613 2632 controller.go:152] Unable to remove old endpoints from kubernetes service: StorageError: key not found, Code: 1, Key: /registry/masterleases/10.43.140.11, ResourceVersion: 0, AdditionalErrorMsg:
and then
Oct 28 16:31:27 machine.example.com microshift[2632]: E1028 16:31:27.239775 2632 reflector.go:138] github.com/openshift/openshift-controller-manager/pkg/unidling/controller/unidling_controller.go:221: Failed to watch *v1.Event: failed to list *v1.Event: events is forbidden: User "system:serviceaccount:openshift-infra:unidling-controller" cannot list resource "events" in API group "" at the cluster scope
and then a stream of
Oct 28 16:31:27 machine.example.com microshift[2632]: E1028 16:31:27.288753 2632 reflector.go:138] github.com/openshift/client-go/operator/informers/externalversions/factory.go:101: Failed to watch *v1alpha1.ImageContentSourcePolicy: failed to list *v1alpha1.ImageContentSourcePolicy: the server could not find the requested resource (get imagecontentsourcepolicies.operator.openshift.io)
Oct 28 16:31:27 machine.example.com microshift[2632]: E1028 16:31:27.309168 2632 reflector.go:138] github.com/openshift/client-go/apps/informers/externalversions/factory.go:101: Failed to watch *v1.DeploymentConfig: failed to list *v1.DeploymentConfig: the server could not find the requested resource (get deploymentconfigs.apps.openshift.io)
Oct 28 16:31:27 machine.example.com microshift[2632]: E1028 16:31:27.309193 2632 reflector.go:138] github.com/openshift/client-go/build/informers/externalversions/factory.go:101: Failed to watch *v1.Build: failed to list *v1.Build: the server could not find the requested resource (get builds.build.openshift.io)
Oct 28 16:31:27 machine.example.com microshift[2632]: E1028 16:31:27.309211 2632 reflector.go:138] github.com/openshift/client-go/build/informers/externalversions/factory.go:101: Failed to watch *v1.BuildConfig: failed to list *v1.BuildConfig: the server could not find the requested resource (get buildconfigs.build.openshift.io)
Oct 28 16:31:27 machine.example.com microshift[2632]: E1028 16:31:27.309228 2632 reflector.go:138] github.com/openshift/client-go/config/informers/externalversions/factory.go:101: Failed to watch *v1.Build: failed to list *v1.Build: the server could not find the requested resource (get builds.config.openshift.io)
Oct 28 16:31:27 machine.example.com microshift[2632]: E1028 16:31:27.309244 2632 reflector.go:138] github.com/openshift/client-go/config/informers/externalversions/factory.go:101: Failed to watch *v1.Proxy: failed to list *v1.Proxy: the server could not find the requested resource (get proxies.config.openshift.io)
Oct 28 16:31:27 machine.example.com microshift[2632]: E1028 16:31:27.309260 2632 reflector.go:138] github.com/openshift/client-go/config/informers/externalversions/factory.go:101: Failed to watch *v1.Image: failed to list *v1.Image: the server could not find the requested resource (get images.config.openshift.io)
Oct 28 16:31:27 machine.example.com microshift[2632]: E1028 16:31:27.309275 2632 reflector.go:138] github.com/openshift/client-go/image/informers/externalversions/factory.go:101: Failed to watch *v1.ImageStream: failed to list *v1.ImageStream: the server could not find the requested resource (get imagestreams.image.openshift.io)
Oct 28 16:31:27 machine.example.com microshift[2632]: E1028 16:31:27.309290 2632 reflector.go:138] github.com/openshift/client-go/image/informers/externalversions/factory.go:101: Failed to watch *v1.Image: failed to list *v1.Image: the server could not find the requested resource (get images.image.openshift.io)
Oct 28 16:31:27 machine.example.com microshift[2632]: E1028 16:31:27.309307 2632 reflector.go:138] github.com/openshift/client-go/template/informers/externalversions/factory.go:101: Failed to watch *v1.TemplateInstance: failed to list *v1.TemplateInstance: the server could not find the requested resource (get templateinstances.template.openshift.io)
Oct 28 16:31:27 machine.example.com microshift[2632]: E1028 16:31:27.309322 2632 reflector.go:138] github.com/openshift/client-go/route/informers/externalversions/factory.go:101: Failed to watch *v1.Route: failed to list *v1.Route: the server could not find the requested resource (get routes.route.openshift.io)
Environment
- MicroShift version (use
microshift version
):
MicroShift Version: 4.8.0-0.microshift-2022-04-20-141053
Base OKD Version: 4.8.0-0.okd-2021-10-10-030117
- Hardware configuration:
A KVM VM.
- OS (e.g:
cat /etc/os-release
):
NAME="Fedora Linux"
VERSION="36 (Thirty Six)"
ID=fedora
VERSION_ID=36
VERSION_CODENAME=""
PLATFORM_ID="platform:f36"
PRETTY_NAME="Fedora Linux 36 (Thirty Six)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:36"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f36/system-administrators-guide/"
SUPPORT_URL="https://ask.fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=36
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=36
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
- Kernel (e.g.
uname -a
):
Linux machine.example.com 5.19.16-200.fc36.x86_64 #1 SMP PREEMPT_DYNAMIC Sun Oct 16 22:50:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
- Others:
Relevant logs
@adelton, the microshift.io site contains references to the old code.
Is there a reason you cannot try the instructions at https://github.com/openshift/microshift?
@ggiguash Do you have https://github.com/openshift/microshift/blob/main/docs/getting_started.md in mind? That seems to focus on running the MicroShift as a VM via virt-install
and a kickstart, rather than deploying on existing RHEL or Fedora machine via rpm
/dnf
package installations. I don't like being forced to these types of VM installations, one reason being that they are hard to automate with beaker because I won't have the harness on that VM.
Is there a getting-started document at https://github.com/openshift/microshift which describes installation of configuration of MicroShift using standard have machine + enable repo(s) + install packages + do some configuration and run services workflow, similar to https://microshift.io/docs/getting-started/?
Is there a getting-started document at https://github.com/openshift/microshift which describes installation of configuration of MicroShift using standard have machine + enable repo(s) + install packages + do some configuration and run services workflow, similar to https://microshift.io/docs/getting-started/?
Yes, see this page for detailed description on how to configure a devenv
My goal is to consume rpm-built MicroShift, on a given RHEL or CentOS or Fedora machine, not really build from sources.
So I tried the steps from https://raw.githubusercontent.com/openshift/microshift/main/docs/config/microshift-starter.ks, basically using RHEL 8.6 and running
# CENTOS8BASE=http://mirror.centos.org/centos/8-stream/BaseOS/x86_64/os/Packages
# curl -LO -s $CENTOS8BASE/selinux-policy-3.14.3-96.el8.noarch.rpm
# curl -LO -s $CENTOS8BASE/selinux-policy-devel-3.14.3-96.el8.noarch.rpm
# curl -LO -s $CENTOS8BASE/selinux-policy-targeted-3.14.3-96.el8.noarch.rpm
# dnf localinstall -y selinux-policy*.rpm
# dnf copr enable -y @redhat-et/microshift-testing
# dnf install -y microshift
# systemctl enable microshift --now
The terminal (ssh) gets stuck eventually. The journalctl -fl
ends with
Oct 31 14:04:51 machine.example.com microshift[17919]: kubelet E1031 14:04:51.873013 17919 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"dns-node-resolver\" with ErrImagePull: \"rpc error: code = Unknown desc = reading manifest sha256:4d182d11a30e6c3c1420502bec5b1192c43c32977060c4def96ea160172f71e7 in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized\"" pod="openshift-dns/node-resolver-45796" podUID=63e75c1a-9689-45db-b646-6eea0a58ed25
Oct 31 14:04:51 machine.example.com microshift[17919]: kubelet E1031 14:04:51.874355 17919 pod_workers.go:951] "Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/cni/net.d/. Has your network provider started?" pod="openshift-dns/dns-default-cnq7k" podUID=e66a2aa8-c940-46ce-8ab7-ddbb92310491
Oct 31 14:04:51 machine.example.com ovs-vsctl[18532]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=15 set Open_vSwitch . "external_ids:ovn-remote=\"unix:/var/run/ovn/ovnsb_db.sock\""
Oct 31 14:04:51 machine.example.com ovs-vsctl[18533]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=15 set Open_vSwitch . external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=10.43.140.16 external_ids:ovn-remote-probe-interval=180000 external_ids:ovn-openflow-probe-interval=180 "external_ids:hostname=\"machine.example.com\"" external_ids:ovn-monitor-all=true external_ids:ovn-ofctrl-wait-before-clear=0 external_ids:ovn-enable-lflow-cache=false external_ids:ovn-memlimit-lflow-cache-kb=870
Oct 31 14:04:51 machine.example.com ovs-vsctl[18534]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=15 -- clear bridge br-int netflow -- clear bridge br-int sflow -- clear bridge br-int ipfix
Oct 31 14:04:51 machine.example.com ovs-vsctl[18536]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=15 -- --if-exists del-port br-int k8s-machine.example -- --may-exist add-port br-int ovn-k8s-mp0 -- set interface ovn-k8s-mp0 type=internal mtu_request=1400 external-ids:iface-id=k8s-machine.example.com
Oct 31 14:04:51 machine.example.com NetworkManager[15509]: <info> [1667221491.9829] manager: (ovn-k8s-mp0): new Open vSwitch Interface device (/org/freedesktop/NetworkManager/Devices/10)
Oct 31 14:04:51 machine.example.com NetworkManager[15509]: <info> [1667221491.9832] device (ovn-k8s-mp0): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')
Oct 31 14:04:51 machine.example.com NetworkManager[15509]: <info> [1667221491.9835] manager: (ovn-k8s-mp0): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/11)
Oct 31 14:04:51 machine.example.com NetworkManager[15509]: <info> [1667221491.9837] device (ovn-k8s-mp0): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'managed')
Oct 31 14:04:51 machine.example.com kernel: device ovn-k8s-mp0 entered promiscuous mode
Oct 31 14:04:51 machine.example.com systemd-udevd[18539]: Using default interface naming scheme 'rhel-8.0'.
Oct 31 14:04:51 machine.example.com systemd-udevd[18539]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Oct 31 14:04:51 machine.example.com systemd-udevd[18539]: Could not generate persistent MAC address for ovn-k8s-mp0: No such file or directory
Oct 31 14:04:51 machine.example.com ovs-vsctl[18543]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=15 set interface ovn-k8s-mp0 "mac=6e\\:29\\:33\\:8c\\:01\\:d4"
Oct 31 14:04:52 machine.example.com NetworkManager[15509]: <info> [1667221492.0000] device (ovn-k8s-mp0): carrier: link connected
Oct 31 14:04:52 machine.example.com ovs-vsctl[18565]: ovs|00001|db_ctl_base|ERR|no port named br-ex
Oct 31 14:04:52 machine.example.com ovs-vsctl[18573]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=15 set Open_vSwitch . external_ids:ovn-bridge-mappings=physnet:br-ex
Oct 31 14:04:52 machine.example.com NetworkManager[15509]: <info> [1667221492.4837] manager: (patch-br-int-to-br-ex_machine.example.com): new Open vSwitch Interface device (/org/freedesktop/NetworkManager/Devices/12)
Oct 31 14:04:52 machine.example.com NetworkManager[15509]: <info> [1667221492.4839] device (patch-br-int-to-br-ex_machine.example.com): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')
Oct 31 14:04:52 machine.example.com NetworkManager[15509]: <info> [1667221492.4841] manager: (patch-br-ex_machine.example.com-to-br-int): new Open vSwitch Interface device (/org/freedesktop/NetworkManager/Devices/13)
Oct 31 14:04:52 machine.example.com NetworkManager[15509]: <info> [1667221492.4842] device (patch-br-ex_machine.example.com-to-br-int): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')
Oct 31 14:04:52 machine.example.com NetworkManager[15509]: <info> [1667221492.4845] manager: (patch-br-int-to-br-ex_machine.example.com): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/14)
Oct 31 14:04:52 machine.example.com NetworkManager[15509]: <info> [1667221492.4846] manager: (patch-br-ex_machine.example.com-to-br-int): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/15)
Oct 31 14:04:52 machine.example.com NetworkManager[15509]: <info> [1667221492.4848] device (patch-br-int-to-br-ex_machine.example.com): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'managed')
Oct 31 14:04:52 machine.example.com NetworkManager[15509]: <info> [1667221492.4849] device (patch-br-ex_machine.example.com-to-br-int): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'managed')
So it seems like starting the microshift
service from @redhat-et/microshift-testing
messes the networking on the machine.
The very first message in the output:
Oct 31 14:04:51 [machine.example.com](http://machine.example.com/) microshift[17919]: kubelet E1031 14:04:51.873013 17919 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"dns-node-resolver\" with ErrImagePull: \"rpc error: code = Unknown desc = reading manifest sha256:4d182d11a30e6c3c1420502bec5b1192c43c32977060c4def96ea160172f71e7 in [quay.io/openshift-release-dev/ocp-v4.0-art-dev](http://quay.io/openshift-release-dev/ocp-v4.0-art-dev): unauthorized: access to the requested resource is not authorized\"" pod="openshift-dns/node-resolver-45796" podUID=63e75c1a-9689-45db-b646-6eea0a58ed25
Looks like a problem with the pull secret.
Putting the pull secret both to ~/.pull-secret.json
and /etc/crio/openshift-pull-secret
seems to make that specific error message go away but the networking still gets reconfigured in such a way that the machine is no longer accessible via ssh on its original IP address.
@adelton Could you also share the log from microshift-ovs-init
systemd service and the output of the following cmds on the microshift node:
ip link show
ip addr show
ovs-vsctl show
The microshift-ovs-init
service sets up an OVS bridge br-ex
on the node interface, It flushes the IP of the node interface and regain the IP on br-ex
bridge. The ssh disconnection might be caused by this network change.
The logs end with
Nov 02 15:47:20 machine.example.com systemd[1]: Started crio-conmon-5a98ec5d1c8971314fdd8e48cc9c6e240f43214be60655ccd9e6d679d30e03ee.scope.
Nov 02 15:47:20 machine.example.com systemd[1]: Started libcontainer container 5a98ec5d1c8971314fdd8e48cc9c6e240f43214be60655ccd9e6d679d30e03ee.
Nov 02 15:47:20 machine.example.com crio[17749]: time="2022-11-02 15:47:20.270924829+01:00" level=info msg="Created container 5a98ec5d1c8971314fdd8e48cc9c6e240f43214be60655ccd9e6d679d30e03ee: openshift-ovn-kubernetes/ovnkube-master-hbwrh/ovnkube-master" id=0bdb829e-1e18-41e9-8d24-fe5adb956706 name=/runtime.v1.RuntimeService/CreateContainer
Nov 02 15:47:20 machine.example.com crio[17749]: time="2022-11-02 15:47:20.271211225+01:00" level=info msg="Starting container: 5a98ec5d1c8971314fdd8e48cc9c6e240f43214be60655ccd9e6d679d30e03ee" id=da177da8-c72b-4306-87df-5822489a736f name=/runtime.v1.RuntimeService/StartContainer
Nov 02 15:47:20 machine.example.com crio[17749]: time="2022-11-02 15:47:20.277347279+01:00" level=info msg="Started container" PID=18543 containerID=5a98ec5d1c8971314fdd8e48cc9c6e240f43214be60655ccd9e6d679d30e03ee description=openshift-ovn-kubernetes/ovnkube-master-hbwrh/ovnkube-master id=da177da8-c72b-4306-87df-5822489a736f name=/runtime.v1.RuntimeService/StartContainer sandboxID=8ccd5dc1892194101430579909aa96b88c656f338217a3f037b7caa39596d08f
Nov 02 15:47:20 machine.example.com crio[17749]: time="2022-11-02 15:47:20.282084230+01:00" level=info msg="CNI monitoring event \"/opt/cni/bin/ovn-k8s-cni-overlay\": CREATE"
Nov 02 15:47:20 machine.example.com crio[17749]: time="2022-11-02 15:47:20.287254605+01:00" level=info msg="Found CNI network crio (type=bridge) at /etc/cni/net.d/100-crio-bridge.conf"
Nov 02 15:47:20 machine.example.com crio[17749]: time="2022-11-02 15:47:20.290184357+01:00" level=info msg="Found CNI network 200-loopback.conf (type=loopback) at /etc/cni/net.d/200-loopback.conf"
Nov 02 15:47:20 machine.example.com crio[17749]: time="2022-11-02 15:47:20.290200823+01:00" level=info msg="CNI monitoring event \"/opt/cni/bin/ovn-k8s-cni-overlay\": WRITE"
Nov 02 15:47:20 machine.example.com crio[17749]: time="2022-11-02 15:47:20.291713541+01:00" level=info msg="Found CNI network crio (type=bridge) at /etc/cni/net.d/100-crio-bridge.conf"
Nov 02 15:47:20 machine.example.com crio[17749]: time="2022-11-02 15:47:20.292704807+01:00" level=info msg="Found CNI network 200-loopback.conf (type=loopback) at /etc/cni/net.d/200-loopback.conf"
Nov 02 15:47:20 machine.example.com crio[17749]: time="2022-11-02 15:47:20.292716311+01:00" level=info msg="CNI monitoring event \"/opt/cni/bin/ovn-k8s-cni-overlay\": WRITE"
Nov 02 15:47:20 machine.example.com crio[17749]: time="2022-11-02 15:47:20.294226316+01:00" level=info msg="Found CNI network crio (type=bridge) at /etc/cni/net.d/100-crio-bridge.conf"
Nov 02 15:47:20 machine.example.com crio[17749]: time="2022-11-02 15:47:20.295363804+01:00" level=info msg="Found CNI network 200-loopback.conf (type=loopback) at /etc/cni/net.d/200-loopback.conf"
Nov 02 15:47:20 machine.example.com microshift[17970]: kubelet E1102 15:47:20.458263 17970 pod_workers.go:951] "Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/cni/net.d/. Has your network provider started?" pod="openshift-dns/dns-default-27mhp" podUID=e18e74e8-a5fb-485d-9cb6-a22b87049af2
Nov 02 15:47:20 machine.example.com ovs-vsctl[18654]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=15 set Open_vSwitch . "external_ids:ovn-remote=\"unix:/var/run/ovn/ovnsb_db.sock\""
Nov 02 15:47:20 machine.example.com ovs-vsctl[18655]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=15 set Open_vSwitch . external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=10.43.140.21 external_ids:ovn-remote-probe-interval=180000 external_ids:ovn-openflow-probe-interval=180 "external_ids:hostname=\"machine.example.com\"" external_ids:ovn-monitor-all=true external_ids:ovn-ofctrl-wait-before-clear=0 external_ids:ovn-enable-lflow-cache=false external_ids:ovn-memlimit-lflow-cache-kb=870
Nov 02 15:47:20 machine.example.com ovs-vsctl[18656]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=15 -- clear bridge br-int netflow -- clear bridge br-int sflow -- clear bridge br-int ipfix
Nov 02 15:47:20 machine.example.com ovs-vsctl[18658]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=15 -- --if-exists del-port br-int k8s-machine.example -- --may-exist add-port br-int ovn-k8s-mp0 -- set interface ovn-k8s-mp0 type=internal mtu_request=1400 external-ids:iface-id=k8s-machine.example.com
Nov 02 15:47:20 machine.example.com NetworkManager[15603]: <info> [1667400440.5332] manager: (ovn-k8s-mp0): new Open vSwitch Interface device (/org/freedesktop/NetworkManager/Devices/10)
Nov 02 15:47:20 machine.example.com NetworkManager[15603]: <info> [1667400440.5334] device (ovn-k8s-mp0): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')
Nov 02 15:47:20 machine.example.com NetworkManager[15603]: <info> [1667400440.5337] manager: (ovn-k8s-mp0): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/11)
Nov 02 15:47:20 machine.example.com NetworkManager[15603]: <info> [1667400440.5338] device (ovn-k8s-mp0): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'managed')
Nov 02 15:47:20 machine.example.com kernel: device ovn-k8s-mp0 entered promiscuous mode
Nov 02 15:47:20 machine.example.com systemd-udevd[18661]: Using default interface naming scheme 'rhel-8.0'.
Nov 02 15:47:20 machine.example.com systemd-udevd[18661]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Nov 02 15:47:20 machine.example.com systemd-udevd[18661]: Could not generate persistent MAC address for ovn-k8s-mp0: No such file or directory
Nov 02 15:47:20 machine.example.com ovs-vsctl[18665]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=15 set interface ovn-k8s-mp0 "mac=d2\\:e6\\:78\\:0c\\:8d\\:2d"
Nov 02 15:47:20 machine.example.com NetworkManager[15603]: <info> [1667400440.5514] device (ovn-k8s-mp0): carrier: link connected
Nov 02 15:47:20 machine.example.com ovs-vsctl[18687]: ovs|00001|db_ctl_base|ERR|no port named br-ex
Nov 02 15:47:20 machine.example.com ovs-vsctl[18695]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=15 set Open_vSwitch . external_ids:ovn-bridge-mappings=physnet:br-ex
Nov 02 15:47:21 machine.example.com NetworkManager[15603]: <info> [1667400441.0768] manager: (patch-br-ex_machine.example.com-to-br-int): new Open vSwitch Interface device (/org/freedesktop/NetworkManager/Devices/12)
Nov 02 15:47:21 machine.example.com NetworkManager[15603]: <info> [1667400441.0770] device (patch-br-ex_machine.example.com-to-br-int): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')
Nov 02 15:47:21 machine.example.com NetworkManager[15603]: <info> [1667400441.0773] manager: (patch-br-int-to-br-ex_machine.example.com): new Open vSwitch Interface device (/org/freedesktop/NetworkManager/Devices/13)
Nov 02 15:47:21 machine.example.com NetworkManager[15603]: <info> [1667400441.0773] device (patch-br-int-to-br-ex_machine.example.com): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')
Nov 02 15:47:21 machine.example.com NetworkManager[15603]: <info> [1667400441.0776] manager: (patch-br-ex_machine.example.com-to-br-int): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/14)
Nov 02 15:47:21 machine.example.com NetworkManager[15603]: <info> [1667400441.0785] manager: (patch-br-int-to-br-ex_machine.example.com): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/15)
Nov 02 15:47:21 machine.example.com NetworkManager[15603]: <info> [1667400441.0786] device (patch-br-ex_machine.example.com-to-br-int): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'managed')
Nov 02 15:47:21 machine.example.com NetworkManager[15603]: <info> [1667400441.0787] device (patch-br-int-to-br-ex_machine.example.com): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'managed')
The microshift-ovs-init service sets up an OVS bridge br-ex on the node interface, It flushes the IP of the node interface and regain the IP on br-ex bridge. The ssh disconnection might be caused by this network change.
So what is the way of getting connected back to the host machine to actually inspect the microshift-ovs-init
service? Because the machine no longer responds on the original IP address, even if I try a new ssh connection ...
The microshift-ovs-init service sets up an OVS bridge br-ex on the node interface, It flushes the IP of the node interface and regain the IP on br-ex bridge. The ssh disconnection might be caused by this network change.
So what is the way of getting connected back to the host machine to actually inspect the
microshift-ovs-init
service? Because the machine no longer responds on the original IP address, even if I try a new ssh connection ...
Unfortunately you cannot reconnect via the original IP address if the br-ex
cannot regain IP address. Is there any additional host interface or virtual console that can be used to reconnect?
There is no other physical host interface and for automation purposes, using the console is not possible.
If this behaviour of microshift
service and other services it starts (microshift-ovs-init
) is expected, shouldn't there be steps described in the installation / setup instructions to show how to preserve access to the machine? For example, you say br-ex
cannot regain IP address. Does it mean that it cannot redo the DHCP request? What IP address does it get set anyway? If we captured the DHCP-provided address before starting the installation and configuration and running of the microshift
service, is there a way to "force" the same address (as a static one) to the post-br-ex
setup?
@adelton, could you explain if the machine gets a different IP address, or all connectivity is lost?
I have no way of knowing. The machine is a remote one so my only option to figure out what is going on is to try to connect to it via ssh.
That's why I believe we need a very solid documentation for preserving that initial IP address and possibility of keeping the ssh connectivity.
My latest tests with microshift
from the @redhat-et/microshift-testing
copr repo on RHEL 8.6 no longer have the problem -- I'm able to ssh to that host just fine even after the node is reported as Ready and pods are (mostly) running:
# oc get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
openshift-dns dns-default-xc5dw 2/2 Running 0 12m
openshift-dns node-resolver-5j8bj 1/1 Running 0 12m
openshift-ingress router-default-7c9c47d97f-ld7mc 1/1 Running 0 12m
openshift-ovn-kubernetes ovnkube-master-fhstc 4/4 Running 0 12m
openshift-ovn-kubernetes ovnkube-node-kspq4 1/1 Running 0 12m
openshift-service-ca service-ca-66b8869cf9-n48cv 1/1 Running 0 12m
openshift-storage topolvm-controller-78876c5fcd-kcqj9 4/4 Running 0 12m
openshift-storage topolvm-node-lj4m4 2/4 CrashLoopBackOff 14 (56s ago) 12m
@adelton do you agree to close this issue given it works with your latest tests?
Sure, if it is clear where/how the change of behaviour happened in the code.
Sure, if it is clear where/how the change of behaviour happened in the code.
Thanks!
/close
@zshi-redhat: Closing this issue.
In response to this:
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.