Extra tmpfs mount on /sys/fs/cgroup causing problems with Hashicorp Nomad cgroup version detection
googol opened this issue · 13 comments
Currently on unraid 6.12.6, the path /sys/fs/cgroup
has two mounts:
# mount -l | grep cgroup
cgroup_root on /sys/fs/cgroup type tmpfs (rw,relatime,size=8192k,mode=755,inode64)
none on /sys/fs/cgroup type cgroup2 (rw,relatime)
As I've come to understand this, the tmpfs mount is something that is normally only used for cgroup v1, and for v2 it is not necessary at all (it is completely hidden by the actual cgroup2 mount). These mounts come from /etc/rc.d/rc.S
on my system, and this repository seems to host the same snippet in /etc/rc.d/rc.S.cont
.
This is not a problem for actual usage of cgroup v2, with version 1.7.0 Hashicorp Nomad changed the way they detect the supported cgroup version, to check the first mount type on the path /sys/fs/cgroup
. Since cgroup v1 is at least typically configured with a tmpfs, Nomad 1.7.x versions are now tripping to thinking unraid's cgroup version is 1, and failing all jobs I'm trying to run.
Could the tmpfs mount be removed in the cgroup v2 branch of the init script?
Some links of previous investigations:
- Nomad issue tracker ticket: hashicorp/nomad#19735
- An attempt at finding the correct place to ask questions about this on the community forum: https://forums.unraid.net/topic/151457-6126-odd-cgroup-mount-point-causing-problems-with-hashicorp-nomad/
Hi @limetech, I hope you don't mind me tagging you directly. Would you have an idea about this or could you tag in someone who could help with this?
Let me start by saying I personally am not the right person to help with this, but I have raised awareness of the issue. Thank you for starting in the forums, and for the link to the issue with the technical details.
To give you a little more info on this repo...
The 6.12 branch in this repo corresponds to the 6.12.6 version of Unraid on your server. This is the release where we started moving /etc/rc.d/ scripts into the repo but it does not contain everything, for instance you won't find rc.S in that branch.
The master branch corresponds to our next major release, and we have moved rc.S and others into the repo to make collaborating on them easier. There have been many changes so the master branch is quite different from what is on your server.
tmpfs mount is something that is normally only used for cgroup v1, and for v2 it is not necessary at all (it is completely hidden by the actual cgroup2 mount).
That's true but it makes no difference if you mount it on a tmpfs and cgroup v2 is working correctly on Unraid as far as I'm concerned, also Docker and to that extent LXC is detecting the correct cgroup version.
This is not a problem for actual usage of cgroup v2, with version 1.7.0 Hashicorp Nomad changed the way they detect the supported cgroup version
Sorry I'm not familiar with Hashicorp Nomad.
Usually you would do something like the following in bash to detect the cgroup:
stat -f -c %T /sys/fs/cgroup
, cat /proc/filesystems | grep cgroup2
or actually check the hierarchy of /sys/fs/cgroup
Since cgroup v1 is at least typically configured with a tmpfs
That's true but that doesn't mean that other Distributions maybe not also mount cgroup v2 as a tmpfs to maybe limit the size of the mount point itself <- however that shouldn't matter much because it would be very hard to reach the 8M that are set for cgroup2 tmpfs on Unraid.
Could the tmpfs mount be removed in the cgroup v2 branch of the init script?
This needs further discussion and I would also like to discuss this with @limetech
cgroup v1 and cgroup v2 have a completely different hierarchy in /sys/fs/cgroup and that should be possible to detect.
My question would be why and how did Hashicorp changed the way to detect the cgroup version? Where there some issues?
Would this problem be really solved if the tmpfs is removed?
Nomad issue tracker ticket:
BTW: I saw that you where referring to the default Slackware rc.S
but default Slackware is using cgroup v1 and not cgroup v2 that's why it's different.
My question would be why and how did Hashicorp changed the way to detect the cgroup version? Where there some issues?
I don't have the full context, but they did a complete overhaul of their use of cgroups in 1.7.0 for better NUMA system scheduling (release, MR)
Would this problem be really solved if the tmpfs is removed?
The problem definitely goes away if the tmpfs is removed. The new logic in Nomad is to find the first mount for /sys/fs/cgroup
that has either type tmpfs
or cgroup2
, and returns either v1 or v2 respectively. So if there's no tmpfs mount underlying the cgroup2 mount, it will be correctly detected.
I will try to argue to the Nomad team that they should change the logic anyways (to maybe ignore the tmpfs mount if there's also a cgroup2 mount), but I don't really expect to get very much priority for that change, since as far as I know I'm the only user that's hitting this issue and I'm just doing this on a homelab :D most distros use systemd and systemd creates the cgroup mount without the tmpfs underneath, so this just works for them
BTW: I saw that you where referring to the default Slackware rc.S but default Slackware is using cgroup v1 and not cgroup v2 that's why it's different.
I thought so as well :)
The problem definitely goes away if the tmpfs is removed. The new logic in Nomad is to find the first mount for
/sys/fs/cgroup
that has either typetmpfs
orcgroup2
, and returns either v1 or v2 respectively. So if there's no tmpfs mount underlying the cgroup2 mount, it will be correctly detected.
I think this should do the job also fine and circumvent that even if looking at /sys/fs/cgroup
(in my opinion this is not the best way to detect the cgroup version):
func scan(in io.Reader) Mode {
scanner := bufio.NewScanner(in)
for scanner.Scan() {
tokens := set.From(strings.Fields(scanner.Text()))
if tokens.Contains("/sys/fs/cgroup") {
if tokens.Contains("cgroup2") {
return CG2
} else if tokens.Contains("tmpfs") {
return CG1
}
}
}
return OFF
}
I will try to argue to the Nomad team
I already made a draft here for Unraid: #1610
I thought so as well :)
It was mainly added for LXC to be honest since it limited some functionality from LXC.
I don't think flipping the cgroup2 and tmpfs cases helps, since the mounted filesystems are read line-by-line, and the tmpfs line comes before the cgroup2 line, so it will still return CG1
I already made a draft here for Unraid: #1610
That's already looking good to me!
It was mainly added for LXC to be honest since it limited some functionality from LXC.
I'm really happy that it's available, since Nomad really wants to use cgroup2 functionality.
Thanks for the attention on this!
I see the change was merged already, thanks! Any timeline for when to expect a new release with that patch included?
When the next public RC/beta from Unraid is released.
SOON™ ;)
Unraid 6.12.7-rc2 is now available:
https://forums.unraid.net/bug-reports/prereleases/unraid-os-version-6127-rc2-available-r2840/
Cool, I'll give it a go later today!
With 6.12.7-rc2 the latest Nomad runs happily! Thanks to both of you for being so helpful!
@googol 6.12.8 was released a few hours ago, can you test if it is working for you and close this issue when everything works as expected now?
I've updated to 6.12.8 and Nomad is running happily. As the fix is in a stable version now, I'm happy to close this. Thanks again!