redhat-performance/JetSki

Problematic communication between clouds caused by foreman interface being left in upstate on masters

mordechai opened this issue · 2 comments

Problem:
We have been having communication issues between ocp clusters using public vlans on different clouds in scalelab.
Nodes receive two default gateways because foreman interface and baremetal are in upstate on OCP 4.7
When deploying OCP 4.9 with Jetski this issue does not reproduce.

Whats expected:
Jetski deploy OCP 4.7 with foreman interface in down state, only baremetal and provisioning networks in upstate.

Description:
when running routeable_api = true on OCP deployment version less than 4.9, masters nodes will get foreman interface in upstate. what should happen is that the foreman interface should be in down state, and the hostname set should match value showed in /etc/dnsmasq.d/ocp4-lab.conf and /home/kni/clusterconfigs/install-config.yaml.bkup as master-0

For details see: https://privatebin.corp.redhat.com/?45292f78f61d0a09#EWirHUph3DCG43P8gzgoACc7CZpsoeLexDSgyH2YHuXf

We currently have the following envs that each consume routable_api = true aka public vlans
Cloud 20: OCP 4.9 (master and interfaces are setup correctly)
Cloud 15: OCP 4.7 latest ( master set with foreman fqdn and foreman interface in upstate)
Cloud 38: OCP 4.7 latest ( master set with foreman fqdn and foreman interface in upstate)

On both Cloud15 and Cloud38 the /etc/dnsmasq.d/ocp4-lab.conf values for hostname for masters are not set because foreman interface is active on their masters, the extraneous route on workers related to foreman interface was removed manually.

Related scalelab tickets opened and believed to be caused by this issue:
https://issues.redhat.com/browse/SCALELAB-1776
https://issues.redhat.com/browse/SCALELAB-1782

The jetski deployment used for all of these deployments was the same
commit 9eca23f

I would recommend adding the following to /etc/sysctl.d/99-sysctl.conf

net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.default.rp_filter = 0

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days