tailscale-dev/deck-tailscale

System extension fails to merge after a SteamOS update

gustakasn0v opened this issue · 41 comments

After the SteamOS 3.5.5 update release today, I noticed my tailscale binary was missing. Running sudo systemd-sysext merge yields the error No suitable extensions found (1 ignored due to incompatible version).

This is because during installation in https://github.com/tailscale-dev/deck-tailscale/blob/main/tailscale.sh#L49 a systemd extension-release file is created in /var/lib/extensions/tailscale/usr/lib/extension-release.d/extension-release.tailscale with the SteamOS version set as the VERSION_ID field. If the SteamOS version changes, systemd fails to load the extension

My file looked like this

SYSEXT_LEVEL=1.0
ID=steamos
VERSION_ID=3.4.11

After changing the version to 3.5.5 and running sudo systemd-sysext merge, tailscale worked as usual

the contained ID= fields have to match unless "_any" is set for the extension. If the extension ID= is not "_any", the SYSEXT_LEVEL= field (if defined) has to match. If the latter is not defined, the VERSION_ID= field has to match instead
https://man.archlinux.org/man/systemd-sysext.8.en

This made me think VERSION_ID only had to match if SYSEXT_LEVEL wasn't defined, so I tried deleting VERSION_ID from the file, but the extension still fails to merge after

Yup! This is kinda-expected behaviour. Should be documented better, but 🤷🏻 . If you re-run the script, it'll regenerate the extension with the new version in the extension-release file.

#11 should fix this permanently, though, because it wouldn't need an extension. I just haven't had energy to devote to it. That's the downside of also being a software developer for work.

If you re-run the script, it'll regenerate the extension with the new version in the extension-release file.

Makes sense. I'll add a note about re-running the script after a SteamOS update to the readme, thanks for clarifying.

I just haven't had energy to devote to it. That's the downside of also being a software developer for work.That's the downside of also being a software developer for work.

Absolutely, I understand 100%, I'm on the same boat 😅

I'm gonna mark this closed; there's a message in the readme about it, and I don't think it'll be possible to make this into a user service and fix the problem long-term that way.

Makes sense. Thanks for the readme note, looks super helpful

From the docs referenced above it looks like the following will not attempt to validate against /etc/os-release allowing the extension to mount after updates.

SYSEXT_LEVEL=1.0
ID=_any

I tried the config above and it seems to mount just fine, which makes me think it would work after an update; the Tailscale binaries are in the correct places and TS comes up after a reboot, but I am not sure how to test this against an actual update.

I'm getting this error though. I'm not quite sure what it is trying to tell me or if it's relevant... everything seems to be working though despite the error, and it is being thrown without any changes.

$: sudo systemd-sysext refresh
Using extensions 'tailscale'.
Failed to mount sysext (type overlay) on /run/systemd/sysext/overlay/usr (MS_RDONLY "lowerdir=/run/systemd/sysext/meta/usr:/run/systemd/sysext/extensions/tailscale/usr:/usr"): Invalid argument

There are probably concerns with setting these values to _any, but if the current suggested flow is to simply run the script every time there is an update this just seems like less hassle.

From dmesg

overlayfs: maximum fs stacking depth exceeded

Sooo I think this is just me misusing systemd-sysext somewhere... After stopping the service, unmerging, then making the change, then remerging/starting everything it is working fine. So my working order of operation is:

$ systemctl stop tailscaled
$ systemctl disable tailscaled
$ systemd-sysext unmerge
$ vim /var/lib/extensions/tailscale/usr/lib/extension-release.d/extension-release.tailscale
< make the change I posted above >
$ systemd-sysext merge
$ systemctl enable tailscaled
$ systemctl start tailscaled

And it seems to work!

I tried the above and it works, thanks @diericx for the findings! I didn't see any errors, and I don't use any system extension or config other than tailscale's so I suspect the error you saw is related to your setup, but I can't be sure.

@legowerewolf if you're okay with this, we can make a PR to 1. change https://github.com/tailscale-dev/deck-tailscale/blob/main/tailscale.sh#L49 to the above 2. remove the README note asking folks to re-run the script after a SteamOS update. Lmk your thoughts, and @diericx lmk if you'd like me to do it and credit you, or you'd like to do it yourself.

P.S. since https://man.archlinux.org/man/systemd-sysext.8.en, says "If the extension ID= is not "_any", the SYSEXT_LEVEL= field (if defined) has to match", it made me think that setting ID=_any means SYSEXT_LEVEL isn't checked. This seems correct: I tried a config file with just ID=_any and it works. Maybe this is because on SteamOS /etc/os-release doesn't define SYSEXT_LEVEL, despite supporting the feature, but I'm not sure

So perhaps we should just go with that to keep it simple. Lmk your thoughts

Holy crap, yeah! I'll check it out on my machine, but I'm optimistic this'll work!

@gustakasn0v I opened a PR here which I decided to keep in draft until we can actually test against a real update. I tried against switching between Beta and Preview which all worked but neither changed the version in os-release.
#14

I attempt to do what you described (just set ID) but I don't actually see where SYSEXT_LEVEL is getting set.

I can also add the change to the README. :)

As I noted in the PR: Checked it out on my Deck, and it's working good there. I'd be happy to merge that in right now, but if we want to wait and see what happens at the next SteamOS update, I'm cool with that too.

I was just thinking about this a bit more. It seems like this solution creates and relies on the following files in /etc/:

  • /etc/systemd/system/tailscaled.service - systemd unit file
  • /etc/systemd/system/tailscaled.service.d/override.conf - sysext config

Doesn't /etc/ get reset on update? So these would still need to get recreated every update right?

More specifically, here are the lines I'm confused about.

cp -rf $tar_dir/systemd/tailscaled.service /etc/systemd/system

cp -rf override.conf /etc/systemd/system/tailscaled.service.d/override.conf

In the blog post that this repo references (https://tailscale.com/blog/steam-deck), it looks like all files are placed in the extension dir /var/lib/extensions/ and then mounted after. For example it looks like the systemd unit file would be here instead:

/var/lib/extensions/tailscale/usr/lib/systemd/system/tailscaled.service

and here after the overlay is mounted

/usr/lib/systemd/system/tailscaled.service

Ah, nope! Changes get preserved via overlayfs.

Changes in /etc/ are preserved in /var/lib/overlays/etc/upper/ via an overlayfs, meaning that they survive updates.

@legowerewolf is that something that is done here or SteamOS applies that overlay? Where is that quote from? That is super interesting.

It's something that SteamOS does that I worked out and put in the readme.

Oh that's perfect!! I had noticed that as well but I wasn't sure what to make of it, I suspected it had something to do with what we were up to here.

This means we could reduce the complexity by doing this without using systemd-sysext. We can just install the binaries in the home directory and then point the systemd unit at these binaries. This would also open the possibility of easily adding an install flow that is completely in userspace meaning we can potentially allow users the option of running this install script and tailscaled without root permissions.

I think the latter point is especially important for a potential fully featured decky plugin. It will be really cool to allow the use of the plugin without having to run any commands with root permissions.

Here is what I'm talking about implemented, I just tested it out.

#15

So then for a userspace install we would just add the flags I mentioned in the other thread to the FLAGS variable in /home/deck/.config/tailscale.defaults (--tun=userspace-networking), install the systemd unit in the users home dir ~/.config/systemd/user/tailscale.service rather than in /etc/, and start it via systemctl --user enable tailscaled.

No sudo required! That would be pretty neat.

Thanks for the proposal, sounds neat indeed!

Apologies if I misunderstood the explanation in #11 (comment), but I have a question:

Instead it will need to be changed to this to use the SOCK5 proxy for outgoing connections:

100.0.0.2 $ nc -lnv 0.0.0.0 9000

100.0.0.1 $ nc -X 5 -x 127.0.0.1:1055 100.0.0.2 9000

Would this mean users would have to configure their Steam Decks to use a SOCKS5 proxy in order to reach devices in their Tailnet? This sounds consistent with https://tailscale.com/blog/steam-deck:

You could also run Tailscale as a portable service, but doing this doesn’t let tailscaled run in kernel networking mode. Portable services have additional constraints similar to Flatpak that make it unviable for use on the Steam Deck. You certainly don’t want to configure everything on your gaming console to use a SOCKS proxy server!

If so, this sounds like a significant trade-off we should discuss. In my mind having to run sudo to install Tailscale trumps having to tinker with Arch/SteamOS to configure a proxy (or worse, having inconsistent access to the Tailnet on a per-app basis), especially if we make the system extension survive updates. But would love to hear your thoughts.

Hm, yeah a full userspace install probably isn't worth it for something like a Steam Deck where most use cases of a VPN are going to be connecting outward to other servers/clients.

My thoughts exactly.

@diericx could you please help us confirm if your proposed userspace setup would indeed require a proxy setup for outbound connections? If so, do you know a way to work around it?

Regarding running tailscale in userspace, those are the limitations yes. Outbound connections attempting to reach a peer will need to go through the hosted SOCKS5 or HTTP server. I don't personally know of a way around that, as iptables and creating a new interface require root access...

Here is my source:

https://tailscale.com/kb/1112/userspace-networking

But just to clarify the PR I mentioned above runs tailscale as root. Was just bringing it up as an idea for potential future work... but I think you're right, userspace tailscaled might be pretty useless for something like a steam deck.

I mean its possible we could set proxy settings in the network but I would have to look into that. I imagine that would be routing all traffic through the tailscale network which is less than ideal, but if we could specify a subnet or something it might work. Starts to sound pretty hacky/inconvenient though.

Ah my mistake, I thought #15 was userspace-based, but it just switches to using ~/.bin to store the tailscale binaries instead of a system extension. I previously thought we needed to use our own system extension to install the systemd unit file, but I didn't know SteamOS preserves /etc changes via an out-of-the-box system extension!

So if I got this right (lmk if I missed anything):

  1. Tailscale binaries can be placed anywhere. Currently we install them in /usr via a system extension, but #15 installs them in ~/.bin
  2. The systemd unit file must be placed in /etc/systemd/system. It's preserved by a SteamOS-managed system extension, nothing extra needed.
  3. By placing the unit file there (instead of ~/.config/systemd/user), tailscaled runs as root, which we want to keep as-is to avoid users having to tinker with proxy settings

With that in mind, removing the need for a system extension by moving the files to ~/.bin sounds like a good move as it's simpler. I'm particularly pleased as it no longer messes with steamos-readonly (see #9 for context). Wondering if there's any edge cases that need testing, but I'll defer that to @legowerewolf


P.S. Apologies again, I mixed up the usernames, I thought @legowerewolf had made the "...yeah a full userspace install probably isn't worth it..." comment for some reason. That's why I asked @diericx again about confirming the SOCKS requirement even though they'd already said it. Must stop replying to Github comments so late 😂

Yep that looks right to me!

I'm particularly pleased as it no longer messes with steamos-readonly

That's awesome!!

Apologies

haha no worries, it was worth it for the further inquiry.

Awesome, all that's left is for @legowerewolf to decide what direction to take: make the sysext survive updates (#14) or move the files to /home avoiding the need for sysext entirely (#15)

Oh wow, that's a lot to catch up on...
Ok, check my understanding here:

  • #15: If we put the binaries in the user's home dir, we can still manage tailscaled as a system service, so we don't need to do any weird userspace networking hacks. The downside of this is that the tailscale binary is no longer in a standard place, so we need to update the user's bashrc (and coordinate with other tools?) to include wherever we want to put it.
  • #14 If we keep the system extension and add ID=_all to the release file, we still keep tailscaled as a system service. The tailscale binary can pretend to exist in a standard place, so all tools can find it without issue. The downside of this is that it still messes with steamos-readonly.

I've never needed to crack the readonly seal, so my inclination is the second path. Before I make the decision, what're some reasons to crack the readonly seal (vs perhaps using system extensions for other things, too)?

so we need to update the user's bashrc

I suppose its not really necessary but there might be a better way to get the binary into the user's path. Maybe /etc/profile?

I'm new to the immutable OS paradigm, so maybe this isn't standard or even the best way to think about it... but it seems to me that if we want to install and run a program on an immutable OS and we don't have access to the build cycle in order to add it in /usr/ level, then doesn't it make more sense for it to be installed along with its configuration in user space?

It seems cleaner to me than trying to overlay the binary onto /usr/, but it is a bit weird that it is a system level service running as root pointed at a specific user's home directory...

I've never needed to crack the readonly seal

I don't think either solution cracks the readonly seal of the OS

@legowerewolf yes, you've summarised it correctly. A couple more thoughts:

so we need to update the user's bashrc (and coordinate with other tools?) to include wherever we want to put it.

The Tailscale control decky plugin will need changing, I've documented this in #15 (comment)

I've never needed to crack the readonly seal

I don't think either solution cracks the readonly seal of the OS

@diericx he's referring to the fact that system extensions break steamos-readonly, documented in #9 and in the readme. Fortunately I keep a postUpdate.sh script to run after an update, so I remember my use cases well:

  1. ln -s ~/.bin/rclone /sbin/mount.rclone so I can have rclone mounts (Google Drive, Dropbox) in my /etc/fstab file as described in https://rclone.org/commands/rclone_mount/. There's plenty of alternatives to this (systemd units, Cloud Drive clients, etc) but at the time I was more familiar with fstab. I've now moved to using the Dropbox app on the Discover store
  2. Install dbus-glib using Pacman: It's a long story, but Firefox's flatpak doesn't support native messaging, which is needed for KeePass password managers to work (https://bugzilla.mozilla.org/show_bug.cgi?id=1621763 keepassxreboot/keepassxc-browser#1631). There's a workaround, but it's a lengthy setup and I wasn't able to get it working. The long-term fix is to support native messaging management in Flatpak, but that PR has been open since Feb 2022! I instead unpacked Firefox's tarball in my home directory, which worked fine but as of a few months ago started crashing if dbus-glib isn't installed, which isn't by default on SteamOS (edit this was fixed a few weeks ago, I removed dbus-glib from my system and Firefox still works https://bugzilla.mozilla.org/show_bug.cgi?id=1532281)

My two cents: I'd choose us having to do extra work (ie. coordinate with Tailscale-control on tailscale's location) over users having to learn our workaround for system extensions breaking steamos-readonly, a tool officially supported by Valve. In any case, I don't expect other tools that need to know tailscale's location and don't use a shell, so it's an easy fix for us

What if we just do both? Figure out which one we want to be the default install method, make a branch for the other one, and go from there?

Assuming that the script can adequately migrate folks to the homedir-based install, we could make that the default? I still prefer the system extension method, so I'm happy to keep maintaining that, but the default ought to be whatever causes less friction for users.

he's referring to the fact that system extensions break steamos-readonly

Ahh I thought he was referring to having to run sudo steamos-readonly disable

What if we just do both?

That could definitely work!


Regarding getting the binary in the user's path, I just had an idea. Here is my PATH value now:

$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/bin:/var/lib/flatpak/exports/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl

And noticed /var/lib/flatpak/exports/bin. It looks like that is rw which means it persists updates right?

$: mount | grep /var
...
/dev/nvme0n1p8 on /var/lib/flatpak type ext4 (rw,relatime)
...

So we could just link it like so. It is definitely not ideal, but it's the only location in PATH that is not in /usr/:

sudo ln -s ~/.bin/tailscale /var/lib/flatpak/exports/bin/tailscale

I have the tailscale-control plugin but it still doesn't seem to be working with this change. I'm not really sure how to debug that yet.

Or we could just straight up install it there rather than in /home maybe that makes more sense

I'd rather not put it in a directory we know to be managed by a different tool. I don't want to risk breaking flatpak/Discover.

yeah, probably a bad idea. I think we may have finally explored every option lol

Figure out which one we want to be the default install method, make a branch for the other one, and go from there?

I'd suggest having both methods on the same branch, have a CLI flag to choose which, and default to the /home one; this way branches won't diverge too much, which is harder to maintain. But it's entirely up to you

Here is my PATH value now... And noticed /var/lib/flatpak/exports/bin...I have the tailscale-control plugin but it still doesn't seem to be working with this change

On my Steam Deck I get the error /bin/sh tailscale: command not found. This means Decky or the plugin are using sh to run the command, whereas your terminal uses bash; I'd have to check exactly how the $PATH variable is defined on SteamOS, but my guess would be that /var/lib/flatpak/exports/bin is added in a place that only applies to bash.

In any case, we can change this and similar lines in the plugin to something like subprocess.run(['/bin/bash', 'tailscale'])

With separate branches, git 'remembers' which installation method you were using (because you checked out that branch). I'll branch off and take #14 into the new branch, and then we can take #15 into main.

For future tracking purposes: I raised saumya-banthia/tailscale-control#8 on the Decky plugin side to use the new install path introduced in #15. I'll work with the plugin maintainer to get this merged and published into the Decky store

With the PR merged, let me just say: this was one of my first relatively-long open source collabs, and I had a lot of fun and learning in it! Thanks @diericx and @legowerewolf

Yeah this was super fun! I'm glad we were able to find a clean solution :)