Version 1.2.2 openshift-install-powervs the SNAT setup appears to change after a running machine config to set SMT
o10222 opened this issue · 3 comments
Brand new Cluster in Toronto Colo,
Create Cluster no issues testing each node with ping -c 3 8.8.8.8 ; curl https://www.google.com all works.
After running OCP machine config to set SMT levels on Worker nodes 4 of the 6 fail the above tests now.
Two still work as desired.
failing worker-0
2: env2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc fq_codel state UNKNOWN group default qlen 1000
link/ether fa:f7:90:6a:40:20 brd ff:ff:ff:ff:ff:ff
inet 192.168.201.211/22 brd 192.168.203.255 scope global dynamic noprefixroute env2
valid_lft 10425sec preferred_lft 10425sec
inet6 fe80::f8f7:90ff:fe6a:4020/64 scope link noprefixroute
valid_lft forever preferred_lft forever
working worker-3
2: env2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc fq_codel state UNKNOWN group default qlen 1000
link/ether fa:57:b5:ab:dc:20 brd ff:ff:ff:ff:ff:ff
inet 192.168.201.249/22 brd 192.168.203.255 scope global dynamic noprefixroute env2
valid_lft 8597sec preferred_lft 8597sec
inet6 fe80::f857:b5ff:feab:dc20/64 scope link noprefixroute
valid_lft forever preferred_lft forever
Route failing worker
[root@tor46-23-2a70-tor01-worker-0 ~]# ip route show
default via 192.168.201.189 dev env2 proto dhcp metric 100
10.128.0.0/14 dev tun0 scope link
172.30.0.0/16 dev tun0
192.168.200.0/22 dev env2 proto kernel scope link src 192.168.201.211 metric 100
route working node
[root@tor46-23-2a70-tor01-worker-3 ~]# ip route show
default via 192.168.201.189 dev env2 proto dhcp metric 100
10.128.0.0/14 dev tun0 scope link
172.30.0.0/16 dev tun0
192.168.200.0/22 dev env2 proto kernel scope link src 192.168.201.249 metric 100
probable solution of the problem:
If bastion is rebooted any time we could see issues with the SNAT GW as ethtool settings are not persistent across reboot. The known fix/workaround of disabling CSUM offloads must be either re-run after a reboot, or on RHEL the ethtool settings can be made persistent by running: nmcli con mod <NIC> ethtool.feature-rx off
.
There should be a PR to make this change in the code - https://github.com/ocp-power-automation/ocp4-upi-powervs/blob/master/modules/5_install/install.tf#L242