libvirt: Unable to access web console
rhopp opened this issue · 47 comments
Version
$ openshift-install version
v0.9.0-master
(compiled from master)
Platform (aws|libvirt|openstack):
libvirt
What happened?
I'm trying to install openshift 4 using this installer. It seems, that everything was OK. I've done all the steps described in here. Installation was ok, I was able to login using oc
with credentials from the installation output, but I'm not able to access web console.
Looking at openshift-console
project, everything seems ok:
OUTPUT
╭─rhopp@dhcp-10-40-4-106 ~/go/src/github.com/openshift/installer ‹master*›
╰─$ oc project openshift-console
Already on project "openshift-console" on server "https://test1-api.tt.testing:6443".
╭─rhopp@dhcp-10-40-4-106 ~/go/src/github.com/openshift/installer ‹master*›
╰─$ oc get all
NAME READY STATUS RESTARTS AGE
pod/console-operator-79b8b8cb8d-cgpfn 1/1 Running 1 1h
pod/openshift-console-6ddfcc76b5-2kmpx 1/1 Running 0 1h
pod/openshift-console-6ddfcc76b5-sp5zm 1/1 Running 0 1h
pod/openshift-console-6ddfcc76b5-z52hq 1/1 Running 0 1h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/console ClusterIP 172.30.198.57 <none> 443/TCP 1h
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deployment.apps/console-operator 1 1 1 1 1h
deployment.apps/openshift-console 3 3 3 3 1h
NAME DESIRED CURRENT READY AGE
replicaset.apps/console-operator-79b8b8cb8d 1 1 1 1h
replicaset.apps/openshift-console-6ddfcc76b5 3 3 3 1h
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
route.route.openshift.io/console console-openshift-console.apps.test1.tt.testing console https reencrypt/Redirect None
The pods are running, service and route are up, but accessing https://console-openshift-console.apps.test1.tt.testing in browser says it couldn't resolve IP address.
As part of the setup I've configured dnsmasq as it was described in the libvirt guide.
For example, ping test1-api.tt.testing
works as expected, but ping console-openshift-console.apps.test1.tt.testing
throws:
ping: console-openshift-console.apps.test1.tt.testing: Name or service not known
What you expected to happen?
Web console to be accessible.
How to reproduce it (as minimally and precisely as possible)?
Follow https://github.com/openshift/installer/blob/master/docs/dev/libvirt-howto.md (my host machine is Fedora 29)
INSTALLATION OUTPUT
╭─rhopp@localhost ~/go/src/github.com/openshift/installer/bin ‹master*›
╰─$ ./openshift-install create cluster
? SSH Public Key [Use arrows to move, type to filter, ? for more help]
/home/rhopp/.ssh/gitlab.cee.key.pub
> <none>
? SSH Public Key [Use arrows to move, type to filter, ? for more help]
> /home/rhopp/.ssh/gitlab.cee.key.pub
<none>
? SSH Public Key /home/rhopp/.ssh/gitlab.cee.key.pub
? Platform [Use arrows to move, type to filter]
> aws
libvirt
openstack
? Platform [Use arrows to move, type to filter]
aws
> libvirt
openstack
? Platform libvirt
? Libvirt Connection URI [? for help] (qemu+tcp://192.168.122.1/system)
? Libvirt Connection URI qemu+tcp://192.168.122.1/system
? Base Domain [? for help] tt.testing
? Base Domain tt.testing
? Cluster Name [? for help] test1
? Cluster Name test1
? Pull Secret [? for help] ************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************* INFO Fetching OS image: redhat-coreos-maipo-47.247-qemu.qcow2.gz
INFO Creating cluster...
INFO Waiting up to 30m0s for the Kubernetes API...
INFO API v1.11.0+e3fa228 up
INFO Waiting up to 30m0s for the bootstrap-complete event...
INFO Destroying the bootstrap resources...
INFO Waiting up to 10m0s for the openshift-console route to be created...
INFO Install complete!
INFO Run 'export KUBECONFIG=/home/rhopp/go/src/github.com/openshift/installer/bin/auth/kubeconfig' to manage the cluster with 'oc', the OpenShift CLI.
INFO The cluster is ready when 'oc login -u kubeadmin -p 5tQwM-fXfkC-MIeAH-BmLeN' succeeds (wait a few minutes).
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.test1.tt.testing
INFO Login to the console with user: kubeadmin, password: 5tQwM-fXfkC-MIeAH-BmLeN
@crawford: Closing this issue.
In response to this:
Duplicate of #411.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
@zeenix: Reopened this issue.
In response to this:
90b0d45 only documents a workaround, unfortunately.
/reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Has anyone had luck with the work around posted in 90b0d45 recently? My libvirt cluster does not bring up the console operator with or without the documented workaround.
I tried setting the oauth hostname statically without wildcards in my dnsmasq config and im still getting oauth console errors.
See below.
dnsmasq config
~$ cat /etc/NetworkManager/dnsmasq.d/openshift.conf
server=/tt.testing/192.168.126.1
address=/.apps.tt.testing/192.168.126.51
address=/oauth-openshift.apps.test1.tt.testing/192.168.126.51
Sanity check that hostname is resolving to proper node IP
~$ ping oauth-openshift.apps.test1.tt.testing
PING oauth-openshift.apps.test1.tt.testing (192.168.126.51) 56(84) bytes of data.
64 bytes from 192.168.126.51 (192.168.126.51): icmp_seq=1 ttl=64 time=0.114 ms
64 bytes from 192.168.126.51 (192.168.126.51): icmp_seq=2 ttl=64 time=0.136 ms
Output of openshift-console crashed pod logs
~$ oc logs -f console-67dbf7f789-k4gqg
2019/05/30 22:51:45 cmd/main: cookies are secure!
2019/05/30 22:51:45 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.test1.tt.testing/oauth/token failed: Head https://oauth-openshift.apps.test1.tt.testing: dial tcp: lookup oauth-openshift.apps.test1.tt.testing on 172.30.0.10:53: no such host
Am I missing something?
Has anyone had luck with the work around posted in 90b0d45 recently?
I just did and except for the usual timeout issue, the cluster came up all good afaict.
/priority important-longterm
@zeenix: GitHub didn't allow me to assign the following users: cfergeau.
Note that only openshift members and repo collaborators can be assigned and that issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide
In response to this:
@cfergeau You said you had a WIP patch to fix this on libvirt level. Do you think you'd be able to get that in, in the near future?
/assign @cfergeau
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Hi. I did the same but still error persist.
Do I need to debug installer? or would there be any other pointer?
tail -f setup/.openshift_install.log
time="2019-08-10T04:47:10+08:00" level=debug msg="Still waiting for the cluster to initialize: Multiple errors are preventing progress:\n* Could not update servicemonitor "openshift-apiserver-operator/openshift-apiserver-operator" (417 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-authentication-operator/authentication-operator" (382 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-cluster-version/cluster-version-operator" (6 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-controller-manager-operator/openshift-controller-manager-operator" (421 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-image-registry/image-registry" (388 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-kube-apiserver-operator/kube-apiserver-operator" (398 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-kube-controller-manager-operator/kube-controller-manager-operator" (402 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-kube-scheduler-operator/kube-scheduler-operator" (406 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-machine-api/cluster-autoscaler-operator" (144 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-machine-api/machine-api-operator" (408 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-operator-lifecycle-manager/olm-operator" (411 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator" (391 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator" (394 of 422): the server does not recognize this resource, check extension API servers"
time="2019-08-10T04:54:14+08:00" level=debug msg="Still waiting for the cluster to initialize: Multiple errors are preventing progress:\n* Could not update servicemonitor "openshift-apiserver-operator/openshift-apiserver-operator" (417 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-authentication-operator/authentication-operator" (382 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-cluster-version/cluster-version-operator" (6 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-controller-manager-operator/openshift-controller-manager-operator" (421 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-image-registry/image-registry" (388 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-kube-apiserver-operator/kube-apiserver-operator" (398 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-kube-controller-manager-operator/kube-controller-manager-operator" (402 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-kube-scheduler-operator/kube-scheduler-operator" (406 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-machine-api/cluster-autoscaler-operator" (144 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-machine-api/machine-api-operator" (408 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-operator-lifecycle-manager/olm-operator" (411 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator" (391 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator" (394 of 422): the server does not recognize this resource, check extension API servers"
time="2019-08-10T04:56:51+08:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-08-09-191209"
time="2019-08-10T04:56:51+08:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-08-09-191209: downloading update"
time="2019-08-10T04:56:56+08:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-08-09-191209"
time="2019-08-10T04:57:11+08:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-08-09-191209: 19% complete"
time="2019-08-10T04:57:22+08:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-08-09-191209: 82% complete"
time="2019-08-10T04:57:38+08:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-08-09-191209: 95% complete"
time="2019-08-10T05:00:27+08:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-08-09-191209: 95% complete"
time="2019-08-10T05:01:40+08:00" level=fatal msg="failed to initialize the cluster: Working towards 4.2.0-0.okd-2019-08-09-191209: 95% complete"
@donghwicha Your issue is unrelated to this one.
thanks. I fixed already.
Has anyone had luck with the work around posted in 90b0d45 recently?
I just did and except for the usual timeout issue, the cluster came up all good afaict.
I increased my timeouts to 90 minutes but still no luck even after applying this "workaround".
I was finally successful. I made a video to help anyone else having a tough time getting through the install process: https://youtu.be/4mFMqNExRWk
To fix this, we probably want/need to make use of the new libvirt mechanism to pass verbatim options to dnsmasq
but to be able to do that, we need terraform support.
Update: Turns out we can make use of the existing XSLT feature of terraform libvirt provider for this.
@zeenix I saw the issue closed in terraform side, so should we add some template in installer here or some other settings here?
@jichenjc I was looking into this last week but w/o success yet. I've also heard that someone is working on this on the ingress operator level so I'll hold off my efforts for now.
Hi,
All my services are running ....
https://twitter.com/fabiosbano/status/1175842429641080832?s=09
Best Regards,
Fabio Sbano
Thanks, @ssbano , I saw a picture and what kind of changes makes that happen? Thanks a lot
You can set dns (bind - bare metal) to resolve the * .apps.${domain} and i made some changes below
[root@argon ~]# cat /etc/NetworkManager/dnsmasq.d/openshift.conf
server=/jaguar.fsbano.com/192.168.126.1
server=/apps.jaguar.fsbano.com/172.27.15.30
[root@argon ~]#
git diff
[root@argon installer]# git diff
diff --git a/cmd/openshift-install/create.go b/cmd/openshift-install/create.go
index 9021025b6..679649d1d 100644
--- a/cmd/openshift-install/create.go
+++ b/cmd/openshift-install/create.go
@@ -238,7 +238,7 @@ func waitForBootstrapComplete(ctx context.Context, config *rest.Config, director
discovery := client.Discovery()
- apiTimeout := 30 * time.Minute
+ apiTimeout := 60 * time.Minute
logrus.Infof("Waiting up to %v for the Kubernetes API at %s...", apiTimeout, config.Host)
apiContext, cancel := context.WithTimeout(ctx, apiTimeout)
defer cancel()
@@ -279,7 +279,7 @@ func waitForBootstrapComplete(ctx context.Context, config *rest.Config, director
// and waits for the bootstrap configmap to report that bootstrapping has
// completed.
func waitForBootstrapConfigMap(ctx context.Context, client *kubernetes.Clientset) error {
- timeout := 30 * time.Minute
+ timeout := 60 * time.Minute
logrus.Infof("Waiting up to %v for bootstrapping to complete...", timeout)
waitCtx, cancel := context.WithTimeout(ctx, timeout)
@@ -317,7 +317,7 @@ func waitForBootstrapConfigMap(ctx context.Context, client *kubernetes.Clientset
// waitForInitializedCluster watches the ClusterVersion waiting for confirmation
// that the cluster has been initialized.
func waitForInitializedCluster(ctx context.Context, config *rest.Config) error {
- timeout := 30 * time.Minute
+ timeout := 60 * time.Minute
logrus.Infof("Waiting up to %v for the cluster at %s to initialize...", timeout, config.Host)
cc, err := configclient.NewForConfig(config)
if err != nil {
diff --git a/data/data/libvirt/main.tf b/data/data/libvirt/main.tf
index 9ba88c9cf..152c78dd5 100644
--- a/data/data/libvirt/main.tf
+++ b/data/data/libvirt/main.tf
@@ -54,6 +54,11 @@ resource "libvirt_network" "net" {
dns {
local_only = true
+ forwarders {
+ address = "172.27.15.30"
+ domain = "apps.${var.cluster_domain}"
+ }
+
dynamic "srvs" {
for_each = data.libvirt_network_dns_srv_template.etcd_cluster.*.rendered
content {
diff --git a/data/data/libvirt/variables-libvirt.tf b/data/data/libvirt/variables-libvirt.tf
index 53cf68bae..79d1018e2 100644
--- a/data/data/libvirt/variables-libvirt.tf
+++ b/data/data/libvirt/variables-libvirt.tf
@@ -32,7 +32,7 @@ variable "libvirt_master_ips" {
variable "libvirt_master_memory" {
type = string
description = "RAM in MiB allocated to masters"
- default = "6144"
+ default = "16384"
}
# At some point this one is likely to default to the number
diff --git a/pkg/asset/machines/libvirt/machines.go b/pkg/asset/machines/libvirt/machines.go
index 2ab6d9aa2..08847ab95 100644
--- a/pkg/asset/machines/libvirt/machines.go
+++ b/pkg/asset/machines/libvirt/machines.go
@@ -63,7 +63,7 @@ func provider(clusterID string, networkInterfaceAddress string, platform *libvir
APIVersion: "libvirtproviderconfig.openshift.io/v1beta1",
Kind: "LibvirtMachineProviderConfig",
},
- DomainMemory: 7168,
+ DomainMemory: 16384,
DomainVcpu: 4,
Ignition: &libvirtprovider.Ignition{
UserDataSecret: userDataSecret,
[root@argon installer]#
@ssbano
thanks a lot !
I actually tried the /etc/NetworkManager/dnsmasq.d/openshift.conf change and seems that works for me ( at least console start up)..
can I know the purpose of following lines? Thanks
+ forwarders {
+ address = "172.27.15.30"
+ domain = "apps.${var.cluster_domain}"
+ }
+
I am using named for wildcard name resolution instead of dnsmasq
The ip address '172.27.15.30' is from my bind service physical machine
Best regards,
Fábio Sbano
ok, thanks for the info ~
Similar issue signature there on 4.2. Interestingly same exact configs (i am using Ansible to set up it) was working only first time and now constantly fails at almost final stage. Authentication - degraded. Spending whole day to find out what could cause that.
In my Bind ocp.example.com.zone
i have *.apps IN A 192.168.1.254
where .254
is HAProxy LB with server infnod-0 infnod-0.ocp.example.com:443 check
. So basically *.apps.ocp.example.com
points to source balanced infra nodes.
frontend ocp-kubernetes-api-server
mode tcp
option tcplog
bind api.ocp.example.com:6443
default_backend ocp-kubernetes-api-server
backend ocp-kubernetes-api-server
balance source
mode tcp
server boostrap-0 bootstrap-0.ocp.example.com:6443 check
server master-0 master-0.ocp.example.com:6443 check
server master-1 master-1.ocp.example.com:6443 check
server master-2 master-2.ocp.example.com:6443 check
frontend ocp-machine-config-server
bind api.ocp.example.com:22623
default_backend ocp-machine-config-server
mode tcp
option tcplog
backend ocp-machine-config-server
balance source
mode tcp
server bootstrap-0 bootstrap-0.ocp.example.com:22623 check
server master-0 master-0.ocp.example.com:22623 check
server master-1 master-1.ocp.example.com:22623 check
server master-2 master-2.ocp.example.com:22623 check
frontend ocp-router-http
bind apps.ocp.example.com:80
default_backend ocp-router-http
mode tcp
option tcplog
backend ocp-router-http
balance source
mode tcp
server infnod-0 infnod-0.ocp.example.com:80 check
server infnod-1 infnod-1.ocp.example.com:80 check
frontend ocp-router-https
bind apps.ocp.example.com:443
default_backend ocp-router-https
mode tcp
option tcplog
backend ocp-router-https
balance source
mode tcp
server infnod-0 infnod-0.ocp.example.com:443 check
server infnod-1 infnod-1.ocp.example.com:443 check
It doesn't matter if i disable boostrap rules after bootstraping is done.
E1027 16:04:32.356766 1 controller.go:129] {AuthenticationOperator2 AuthenticationOperator2} failed with: failed handling the route: route is not available at canonical host oauth-openshift.apps.ocp.example.com: []
If i ssh core@master-0.ocp.example.com
and ping/dig oauth-openshift.apps.ocp.example.com
i get an IP of LB node (.254).
I don't know should infras be in this state at this point.
Before all this, i had issue with SeLinux on my LB machine because i was missing:
semanage port -a 22623 -t http_port_t -p tcp
semanage port -a 6443 -t http_port_t -p tcp
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle rotten
/remove-lifecycle stale
is there a way for @openshift-bot to give this immunity to becoming stale?
/remove-lifecycle rotten
This is still something we want to fix. There are just a surprisingly large number of pieces that are falling in place in the background that we need before we can tackle this.
I might ask a stupid question but I'll ask anyway.
What is the reason of the option local_only = true when local_ony = false would fix this issue ?
local_only - (Optional) true/false: true means 'do not forward unresolved requests for this domain to the part DNS server
I ran the follow test:
sed -i 's/local_only = true/local_only = false/' /root/go/src/github.com/openshift/installer/data/data/libvirt/main.tf
TAGS=libvirt hack/build.sh
mkdir /root/bin
cp -rf /root/go/src/github.com/openshift/installer/bin/openshift-install /root/bin/
yum install dnsmasq
echo -e "[main]\ndns=dnsmasq" | sudo tee /etc/NetworkManager/conf.d/openshift.conf
echo listen-address=127.0.0.1 > /etc/NetworkManager/dnsmasq.d/openshift.conf
echo bind-interfaces >> /etc/NetworkManager/dnsmasq.d/openshift.conf
echo server=8.8.8.8 >> /etc/NetworkManager/dnsmasq.d/openshift.conf
echo address=/apps.ocp.openshift.local/192.168.126.1 >> /etc/NetworkManager/dnsmasq.d/openshift.conf
systemctl reload NetworkManager
3x master
3x workers
and using a Container Loadbalancer
/usr/bin/podman run -d --name loadbalancer --net host
-e API="bootstrap=192.168.126.10:6443,master-0=192.168.126.11:6443,master-1=192.168.126.12:6443,master-2=192.168.126.13:6443"
-e API_LISTEN="0.0.0.0:6443"
-e INGRESS_HTTP="worker-0=192.168.126.51:80,worker-1=192.168.126.52:80,worker-2=192.168.126.53:80"
-e INGRESS_HTTP_LISTEN="0.0.0.0:80"
-e INGRESS_HTTPS="worker-0=192.168.126.51:443,worker-1=192.168.126.52:443,worker-2=192.168.126.53:443"
-e INGRESS_HTTPS_LISTEN="0.0.0.0:443"
-e MACHINE_CONFIG_SERVER="bootstrap=192.168.126.10:22623,master-0=192.168.126.10:22623,master-1=192.168.126.11:22623,master-2=192.168.126.12:22623"
-e MACHINE_CONFIG_SERVER_LISTEN="127.0.0.1:22623"
quay.io/redhat-emea-ssa-team/openshift-4-loadbalancer
And the installation went well.
I use to solve this by changing the APPS URL to apps. instead of apps.. but, since I want to use the default APPs URL, I've also solved it by modifying data/data/libvirt/main.tf, but instead of changing the local_only, I added a forward entry just for apps.. domains to the libvirt network gateway, and in the KVM host where I have this configuration in dnsmasq managed by NetworkManager:
dns {
local_only = true
forwarders {
address = "192.168.122.1"
domain = "apps.$clustername .$basedomain"
}
This is the KVM dnsmasq config:
server=/$basedomain/192.168.126.1
address=/.apps.$clustername.$basedomain/192.168.126.1
Doing we maintain libvirt dnsmasq manage everything less the apps URL (that it wouldn't resolve because of this issue) that are forwarded to the KVM dnsmasq that actually works.
You can check my playbook that configure the kvm here:
https://github.com/luisarizmendi/ocp-libvirt-ipi-role/blob/master/tasks/kvm_deploy.yml
And the playbook that change the data/data/libvirt/main.tf file here:
https://github.com/luisarizmendi/ocp-libvirt-ipi-role/blob/master/tasks/ocp_deploy.yml
https://gitlab.com/libvirt/libvirt/-/commit/fb9f6ce625322d10b2e2a7c3ce4faab780b97e8d might be a way to add the needed options to libvirt dnsmasq instance, which would allow to do all the cluster-related name resolution on 192.168.126.1 rather than having to go through a second dnsmasq instance managed by NetworkManager.
@cfergeau totally agreed, I did some tests on RHEL/CENTOS8 with libvirt 5.6 where libvirt manage all the DNS entries including *.apps.
https://github.com/RedHat-EMEA-SSA-Team/labs/tree/master/disk-encryption#creating-libvirt-network
Best Regards
To make the feature proposed by @ralvares work when using the Terraform provider for libvirt, the following XSLT transformation can be applied https://github.com/samuelvl/ocp4-disconnected-lab/blob/master/src/dns/libvirt-dns.xml.tpl
resource "libvirt_network" "openshift" {
...
xml {
xslt = data.template_file.openshift_libvirt_dns.rendered
}
}
Here's a workaround: #1648 (comment)
Last week while trying to do some basic verification I ran into an issue where the workaround listed in the installer troubleshooting doc wasn't working. We figured out it was due to the fact that I had spun up a cluster with three workers, but the ingress controller has 2 set in its replicaset. So neither of those pods landed on the. 51 worker -- and we saw the same symptoms as if no workaround had been applied. It doesn't look like there's a way to do wildcards and have multiple IPs for a host entry. dnsmasq seems to take the last entry in a file as the IP instead of do any kind of round-robin. Any suggestions? Or do we just need to edit the manifest for the ingress operator to create 3 replicas?
@clnperez I'm running into the same issue. Did you manage to find a solve?
@marshallford no, nothing other than spinning up that 3rd replica for the ingress.
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle rotten
/remove-lifecycle stale
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting /reopen
.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Exclude this issue from closing again by commenting /lifecycle frozen
.
/close
@openshift-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting
/reopen
.
Mark the issue as fresh by commenting/remove-lifecycle rotten
.
Exclude this issue from closing again by commenting/lifecycle frozen
./close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.