vmware/photon

Photon 3.0 cgroup exhaustion. VM shuts down with CPU disabled

jkevinezz opened this issue · 16 comments

Describe the bug

What we are seeing is, we use photon 3.0 as the OS for our Tanzu K8S nodes. What we are seeing is randomly we see an error cgroups exhaution. then we see the VM reboot, and in vCenter events it shows cpu has been disabled

Reproduction steps

...
Haven't been able to reproduce manually, it just happens randomly.

Expected behavior

How do we find out whats causing the cgroup exhaustion and in return causing photon kernel to disable cpu and reboot itself

Additional context

No response

Hi,
as a volunteer here, have you tried to open a SR @ https://www.vmware.com/go/customerconnect ? See SAM offering.
For TKG, you could collect all logfiles in reference to kb90319. Also, have a look to the VMware Tanzu Compliance Product Documentation.
Could be a subcomponent bug and/or resource limitation related without burst possibility, but without logs and compliance status that's a guess only.
Hope this helps.

orchestrating 8 cases++ /cc @Vasavisirnapalli

@jkevinezz ,

Which kernel version are you using?
Do you see cgroup.memory=nokmem in cat /proc/cmdline?
Could you please share kernel logs.

Thanks.

I cannot see any log snippet.
Check kernel version via uname -a.
kernel logs is dmesg command output.
Also run cat /proc/cmdline to check if cgroup.memory=nokmem parameter is present.
We suspect it can be older kernel issue which was fixed by
1c4e936

@prashant1221 what about 'kernel panic' fix f029de1, in correlation to 'node random reboot' and features eligible for 3 datacenter ? Here a patch filtering attempt using keywords.

Also, can you please share the output of slabtop -sc --once in the nodes which experiance this issue often.

@jkevinezz fyi

Accordingly to Photon OS – Planned End of Support Schedule, an upgrade of Photon OS 3 is recommended.

Patch/Update as continuous action has been addressed in the last years by introducing somewhat a bunch of improvements.

The actual docs provide a short description about the upgrade process which is very easy btw.

Inplace migrations including fips mode, bios->uefi, kernel, docker, kubernetes, etc. afaik were never drilled down systematically for the docs of the open-source version of Photon OS.
My bad, the doc attempt here was somewhat insufficient and no other attempts were been made since then.
As every software is a continuous additions/deletions pulse, yes there were a few issues as well, e.g. 1244, 1226, 1234, 1420.

Having said that, with your VMware Support Account Manager, populating a migration path solution should be considered,
using

As soon as the content of your Tanzu K8S nodes is sort of sbom'ified and populated for the migration, planning it for the maintenance schedule gets easy.

Here some thoughts.

You have a pod with memory limit set to 614400kB and total-vm of 22318872kB. The memory limit is reached (mem_cgroup_out_of_memory), and for higher memory consumption oom-killer (out of memory) kicks in. The first process eligible for this is 29860. Problematic is the fact that afterwards, oom_reaper reports that it didn't gain anything with that, see "now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB". This happens for the cascade of processes 29860, 30411, 26868, 29920, 4274, 3243, 15616, 9308, 2625, 19388. So, why just pain and no gain? Unfortunately I'm not skilled enough to read the slabtop output.
The kubernetes case Container Limit cgroup causing OOMkilled still is open.
'Tasks state (memory values in pages)' doesn't list RssAnon (Size of resident anonymous memory), RssFile (Size of resident file mappings) and RssShmem (Size of resident shared memory). This has been addressed lately in a commit.
In addition, this happens for higher kernel versions in constellations with cgroupv1 as well, see bugzilla bug 207273. Btw. cgroupv2 has been introduced a while ago, see https://kubernetes.io/docs/concepts/architecture/cgroups/#using-cgroupv2. Ph4 + Ph5 support cgroupv2.