openshift/installer

libvirt: cannot set master node vm memory

ValentinoUberti opened this issue · 12 comments

Version

$ openshift-install version
./openshift-install unreleased-master-1452-g6e2977c740853e842247b55c7cd08d9b350f3e93-dirty
built from commit 6e2977c740853e842247b55c7cd08d9b350f3e93
release image registry.svc.ci.openshift.org/origin/release:4.2

Platform:

libvirt

What happened?

domainMemory var value in {INSTALL_DIR}/openshift/99_openshift-cluster-api_master-machines-0.yaml is not used for create the master vm. Master vm always have 6G of ram.

What you expected to happen?

Change to domainMemory var should be reflected during master vm creation

How to reproduce it ?

  1. Create manifiests:
    ./openshift-install create manifests --dir=./ocp4
  2. vi ocp4/openshift/99_openshift-cluster-api_master-machines-0.yaml
  3. Change value of domainMemory
  4. build the cluster: openshift-install create manifests --dir=./ocp4
  5. After a while, check the master vm info with libvirt:
    sudo virsh dommemstat
    the vaule of "actual" should be equal to the domainMemory var, but it is not.

/label platform/libvirt

Not sure yet who uses this config in the end (is it used at all?) but master is created through terraform and the terrform config respects the libvirt_master_memory variable (which translate to TF_VAR_libvirt_master_memory environment variable so there is way to specify the memory at least and that is the official way AFAIK.

Having said that, we need to check what's up with openshift/99_openshift-cluster-api_master-machines-0.yaml file.

/label plaform/libvirt

So turns out this is the config for machines created by libvirt cluster api provider but master is created by Installer through terraform. So this config ends up being used for worker, instead of master. It's a bit confusing cause it's not obvious at all.

small correction: For worker nodes, we've openshift/99_openshift-cluster-api_worker-machines-0.yaml file. And that is respected. If you create a master node through libvirt provider (after Installer is done), it will respect the values provided in the openshift/99_openshift-cluster-api_master-machines-0.yaml file.

So the problem is that the first master node is being created by Installer and it only uses the terraform configuration. I think it should respect the manifest config somehow.

Thank you. I get some of OOM on master node, that's why i tried to change master node memory. Installer stop @ 98% on Fedora 30.

Hi @zeenix. The openshift/99_openshift-cluster-api_worker-machines-0.yaml file contains the workers machineset and if you change the replicas you effectively get the expected numbers of nodes, but memory settings changes don't work there too.

I think I have found the missing connection to the worker behavior.

The following provider function is used to define defaults for the Machines:
https://github.com/openshift/installer/blob/master/pkg/asset/machines/libvirt/machines.go#L60-L80

Here we can see the deafult values for cpu and the 6144 memory value. The provider function return a *libvirtprovider.LibvirtMachineProviderConfig struct that is used the in the MachineSets() function in the same package: https://github.com/openshift/installer/blob/master/pkg/asset/machines/libvirt/machinesets.go#L16-L78.

I can confirm you that by changing the DomainMemory to 8192 field in the provider function the generated manifests assume the above value.
https://github.com/openshift/installer/blob/master/pkg/asset/machines/libvirt/machines.go#L66

The inspected manifests are:

  • 99_openshift-cluster-api_master-machines-0.yaml
  • 99_openshift-cluster-api_worker_machineset-0.yaml

Anyway, on cluster creation the virtual master and worker virtual machines are created with different values:

  • master: 6144
  • worker: 8192

This is probably because the master is managed through the terraform variables, as @zeenix pointed out, and the worker by a MachineSet with machine configs passed through the above mentioned **libvirtprovider.LibvirtMachineProviderConfig struct.

After this first attempt I rebuilt the installer updating the memory value in the data/data/libvirt/variables-libvirt.tf file which provides default Terraform variables.
https://github.com/openshift/installer/blob/master/data/data/libvirt/variables-libvirt.tf#L35

This time I can confirm that both master and worker are created with 8192 MiB of RAM.

I talked to @abhinavdahiya about this and seems other platforms the master machine objects provide the vcpu and mem value to terrform. So we need to do that same for libvirt. I'll look into this next week.

should be fixed by #2399

/close

should be fixed by #2399

/close

@abhinavdahiya: Closing this issue.

In response to this:

should be fixed by #2399

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.