target/lorri

nixpkgs `services.lorri` requires reboot for the user lorri.socket to appear

Profpatsch opened this issue · 11 comments

Describe the bug

When putting systemd.lorri.enable = true; in your system config, you would assume that the socket unit appears immediately after a system switch, but in fact it only does after reboot. Logging out and into the user account does not help.

To Reproduce

  • Remove any lorri from your system
  • If you had the service active, remove from your config and reboot
  • Add lorri.service.enable = true; and switch your system (at least nixos 20.03)
  • systemctl --user does not have any lorri service or socket
  • reboot
  • systemctl --user has the lorri.socket now

Expected behavior

The socket file should appear immediately after the NixOS system switch.

Additional context

Grepping through nixpkgs for systemd.user.sockets (and similar) only yields like three results, so it might be that starting a .socket unit for the user daemon is just not a common use-case and we’re in uncharted territory.

Possible solutions I can think of:

  • Can we set up lorri as system service instead of user service? Would there be cons?
  • Document that a reboot is necessary. That’s meh
  • Fix the systemd.user config to update on switch. Probably hard, might even be impossible.

cc @curiousleo

Doesn’t require reboot, just a reload of the user daemon:

systemctl --user daemon-reload

This is a regression in nixpkgs: NixOS/nixpkgs@b20a0e4#r38459106

@Profpatsch I'd like to ask a (possibly naive) question about this issue.

Why is the socket required at all? It seems like the weird interaction with systemd sockets is one of the major pain points for new users. As such, would it not be better to just have the very resource-light lorri daemon command always be running? Most other user services e.g. on NixOS already seem to operate in this manner.

As such, would it not be better to just have the very resource-light lorri daemon command always be running? Most other user services e.g. on NixOS already seem to operate in this manner.

the .socket file should just be a small improvement over the service file, true. Maybe we should remove it for less confusion.

However I’m fairly sure that the issue above pertains to the .service file as well.

I have an issue I think is related where the lorri socket doesn't seem to reactivate after stopping it or reloading it:

[cody@nixos:~]$ systemctl --user daemon-reload

[cody@nixos:~]$ systemctl --user status lorri.service 
● lorri.service - lorri build daemon
     Loaded: loaded (/nix/store/8241pdh8fvrwg8ava3476rnb3050laqc-home-manager-f>
     Active: inactive (dead) since Tue 2020-06-09 18:14:09 CDT; 7min ago
TriggeredBy: ● lorri.socket
   Main PID: 2100 (code=killed, signal=TERM)

Jun 09 18:12:43 nixos systemd[1853]: Started lorri build daemon.
Jun 09 18:12:43 nixos lorri[2100]: Jun 09 18:12:43.338 INFO ready
Jun 09 18:13:24 nixos lorri[2100]: Jun 09 18:13:24.154 INFO build status, messa>
Jun 09 18:13:33 nixos lorri[2100]: Jun 09 18:13:33.297 INFO build status, messa>
Jun 09 18:13:41 nixos lorri[2100]: Jun 09 18:13:41.941 INFO build status, messa>
Jun 09 18:13:45 nixos lorri[2100]: Jun 09 18:13:45.933 INFO build status, messa>
Jun 09 18:14:09 nixos systemd[1853]: Stopping lorri build daemon...
Jun 09 18:14:09 nixos systemd[1853]: lorri.service: Succeeded.
Jun 09 18:14:09 nixos systemd[1853]: Stopped lorri build daemon.

[cody@nixos:~]$ cd smurf
direnv: loading ~/smurf/.envrc
Jun 09 18:22:01.771 INFO lorri daemon is not running, loading a cached environment, expr: /home/cody/smurf/shell.nix
direnv: export +AR +AS +CC +CONFIG_SHELL +CXX +HOST_PATH +IN_NIX_SHELL +LD +NIX_BINTOOLS +NIX_BINTOOLS_WRAPPER_x86_64_unknown_linux_gnu_TARGET_HOST +NIX_BUILD_CORES +NIX_BUILD_TOP +NIX_CC +NIX_CC_WRAPPER_x86_64_unknown_linux_gnu_TARGET_HOST +NIX_ENFORCE_NO_NATIVE +NIX_GHC +NIX_GHCPKG +NIX_GHC_DOCDIR +NIX_GHC_LIBDIR +NIX_HARDENING_ENABLE +NIX_INDENT_MAKE +NIX_LDFLAGS +NIX_LOG_FD +NIX_STORE +NM +OBJCOPY +OBJDUMP +RANLIB +READELF +SIZE +SOURCE_DATE_EPOCH +STRINGS +STRIP +allowSubstitutes +buildInputs +builder +configureFlags +depsBuildBuild +depsBuildBuildPropagated +depsBuildTarget +depsBuildTargetPropagated +depsHostHost +depsHostHostPropagated +depsTargetTarget +depsTargetTargetPropagated +doCheck +doInstallCheck +extraClosure +installPhase +name +nativeBuildInputs +origArgs +origBuilder +origExtraClosure +origOutputs +origPATH +origSystem +out +outputs +patches +phases +preHook +preferLocalBuild +propagatedBuildInputs +propagatedNativeBuildInputs +shell +shellHook +stdenv +strictDeps +system ~LOCALE_ARCHIVE ~PATH

[cody@nixos:~/smurf]$ systemctl --user status lorri.service 
● lorri.service - lorri build daemon
     Loaded: loaded (/nix/store/8241pdh8fvrwg8ava3476rnb3050laqc-home-manager-f>
     Active: inactive (dead) since Tue 2020-06-09 18:14:09 CDT; 7min ago
TriggeredBy: ● lorri.socket
   Main PID: 2100 (code=killed, signal=TERM)

Jun 09 18:12:43 nixos systemd[1853]: Started lorri build daemon.
Jun 09 18:12:43 nixos lorri[2100]: Jun 09 18:12:43.338 INFO ready
Jun 09 18:13:24 nixos lorri[2100]: Jun 09 18:13:24.154 INFO build status, messa>
Jun 09 18:13:33 nixos lorri[2100]: Jun 09 18:13:33.297 INFO build status, messa>
Jun 09 18:13:41 nixos lorri[2100]: Jun 09 18:13:41.941 INFO build status, messa>
Jun 09 18:13:45 nixos lorri[2100]: Jun 09 18:13:45.933 INFO build status, messa>
Jun 09 18:14:09 nixos systemd[1853]: Stopping lorri build daemon...
Jun 09 18:14:09 nixos systemd[1853]: lorri.service: Succeeded.
Jun 09 18:14:09 nixos systemd[1853]: Stopped lorri build daemon.

I am seeing a similar thing as @codygman
In my case there is no socket nor service even after switching to services.lorri.enable = true and restarting.
Using nixos-unstable.

pepe@arrow ~> systemctl --user  list-units  | grep lorri
pepe@arrow ~> systemctl --user  list-sockets --all
LISTEN                      UNIT              ACTIVATES
/run/user/1000/bus          dbus.socket       dbus.service
/run/user/1000/pipewire-0   pipewire.socket   pipewire.service
/run/user/1000/pulse/native pulseaudio.socket pulseaudio.service

We are swallowing varlink errors in the check, so I assume there is a varlink error message here that we silently ignore:

let ping_sent = if let Ok(connection) = varlink::Connection::with_address(&address) {
use rpc::VarlinkClientInterface;
rpc::VarlinkClient::new(connection)
.watch_shell(shell_nix)
.call()
.is_ok()
} else {
false
};

cc @curiousleo @nyarly
We should really pay attention to not swallow any errors, it usually hides important bugs down the road.

Lemme write a patch that shows what is going on.

Please check out #441 and try reproducing the problem after running lorri self-upgrade local /path/to/lorri/checkout @codygman, @pepeiborra. It should print a better stacktrace then and we can debug what the problem is.

lorri self-upgrade won't work for me.

I updated to the latest with niv though in this commit. It seems to me that for some reason the new symbolic link to lorri isn't being loaded into systemd. This is similar to me in how I have to restart emacs after installing packages with nix because it's a wrapper that modifes the load path. I'm not sure the way to workaround this issue here though.

Here are the results of debugging I mostly pasted from this commit:

[cody@nixos:~/hci/testdata/haskell-nix-stack-workflow]$ journalctl --user -u lorri -f 
-- Logs begin at Sun 2019-11-17 17:48:35 CST. --
Jul 12 15:27:31 nixos systemd[1524]: Started lorri build daemon.
Jul 12 15:27:31 nixos systemd[17166]: lorri.service: Failed to execute command: No such file or directory
Jul 12 15:27:31 nixos systemd[17166]: lorri.service: Failed at step EXEC spawning /nix/store/94phsz8nvlmbm0gsp5hrkkmmaark4jia-source/bin/lorri: No such file or directory
Jul 12 15:27:31 nixos systemd[1524]: lorri.service: Main process exited, code=exited, status=203/EXEC
Jul 12 15:27:31 nixos systemd[1524]: lorri.service: Failed with result 'exit-code'.
Jul 12 15:27:31 nixos systemd[1524]: lorri.service: Scheduled restart job, restart counter is at 5.
Jul 12 15:27:31 nixos systemd[1524]: Stopped lorri build daemon.
Jul 12 15:27:31 nixos systemd[1524]: lorri.service: Start request repeated too quickly.
Jul 12 15:27:31 nixos systemd[1524]: lorri.service: Failed with result 'exit-code'.
Jul 12 15:27:31 nixos systemd[1524]: Failed to start lorri build daemon.

Indeed, that doesn't exist:

[cody@nixos:~/hci]$ ls /nix/store/94phsz8nvlmbm0gsp5hrkkmmaark4jia-source/bin/lorri
ls: cannot access '/nix/store/94phsz8nvlmbm0gsp5hrkkmmaark4jia-source/bin/lorri': No such file or directory

Here's the lorri unit file:

[cody@nixos:~/hci]$ systemctl --user cat lorri
# /nix/store/nm14xpfpd2pirj1v2pvpd1kwk1yi02mh-home-manager-files/.config/systemd/user/lorri.service
[Service]
Environment=PATH=/nix/store/15wpim5v22189mrfrm6n8ggg410n3rvi-nix-2.3.6/bin:/nix/store/r1jr2v6afyhc19yx9i1ck230x9987c6q-git-2.27.0/bin:/nix/store/ldh4iwq6dxsv6cj78gv060wq9xa17n1h-gnutar-1.32/bin:/nix/store/yir8n5jfmcdmf0hyk712kl97krmdshlg-gzip-1.10/bin
ExecStart=/nix/store/7988ydmqdwk1p64sjysmi06bq523nbac-lorri/bin/lorri daemon
PrivateTmp=true
ProtectHome=read-only
ProtectSystem=strict
Restart=on-failure

[Unit]
After=lorri.socket
Description=lorri build daemon
RefuseManualStart=true
Requires=lorri.socket

Hmm:

ExecStart=/nix/store/7988ydmqdwk1p64sjysmi06bq523nbac-lorri/bin/lorri daemon

Well.. that is right

[cody@nixos:~/hci]$ /nix/store/7988ydmqdwk1p64sjysmi06bq523nbac-lorri/bin/lorri --version
lorri 1.1.0

So the errors above aren't relevant... so then why isn't the socket
activating? Let's filter the logs:

[cody@nixos:~/hci]$ journalctl --user -u lorri --since='5 minutes ago'
-- Logs begin at Sun 2019-11-17 17:48:35 CST, end at Sun 2020-07-12 15:42:36 CDT. --
-- No entries --

I cd into a directory that uses lorri:

[cody@nixos:~/hci/testdata]$ cd haskell-nix-stack-workflow/
direnv: loading ~/hci/testdata/haskell-nix-stack-workflow/.envrc
Jul 12 15:45:51.351 WARN lorri daemon is not running and this project has not yet been evaluated, please run `lorri daemon`, expr: /home/cody/hci/testdata/haskell-nix-stack-workflow/shell.nix
direnv: export +IN_NIX_SHELL

Okay... direnv allow in my experince has never failed to ping lorri so
I'll try that:

[cody@nixos:~/hci/testdata/haskell-nix-stack-workflow]$ direnv allow
direnv: loading ~/hci/testdata/haskell-nix-stack-workflow/.envrc
Jul 12 15:46:17.355 WARN lorri daemon is not running and this project has not yet been evaluated, please run `lorri daemon`, expr: /home/cody/hci/testdata/haskell-nix-stack-workflow/shell.nix
direnv: export +IN_NIX_SHELL

Surely we've generated some logs by now:

[cody@nixos:~/hci/testdata/haskell-nix-stack-workflow]$ journalctl --user -u lorri --since='5 minutes ago'
-- Logs begin at Sun 2019-11-17 17:48:35 CST, end at Sun 2020-07-12 15:42:36 CDT. --
-- No entries --

Okay.. what if we do a reload as recommended in #374 (comment)

[cody@nixos:~/hci/testdata/haskell-nix-stack-workflow]$ systemctl --user daemon-reload
[cody@nixos:~/hci/testdata/haskell-nix-stack-workflow]$ journalctl --user -u lorri --since='5 minutes ago'
-- Logs begin at Sun 2019-11-17 17:48:35 CST, end at Sun 2020-07-12 15:47:23 CDT. --
-- No entries --
[cody@nixos:~/hci/testdata/haskell-nix-stack-workflow]$ direnv allow
direnv: loading ~/hci/testdata/haskell-nix-stack-workflow/.envrc
Jul 12 15:47:37.141 WARN lorri daemon is not running and this project has not yet been evaluated, please run `lorri daemon`, expr: /home/cody/hci/testdata/haskell-nix-stack-workflow/shell.nix
direnv: export +IN_NIX_SHELL
[cody@nixos:~/hci/testdata/haskell-nix-stack-workflow]$ journalctl --user -u lorri --since='5 minutes ago'
-- Logs begin at Sun 2019-11-17 17:48:35 CST, end at Sun 2020-07-12 15:47:23 CDT. --
-- No entries --

No luck.

After rebooting things worked just fine and it doesn't look like anything changed:

[cody@nixos:~/hci]$ systemctl --user cat lorri
# /nix/store/nm14xpfpd2pirj1v2pvpd1kwk1yi02mh-home-manager-files/.config/systemd/user/lorri.service
[Service]
Environment=PATH=/nix/store/15wpim5v22189mrfrm6n8ggg410n3rvi-nix-2.3.6/bin:/nix/store/r1jr2v6afyhc19yx9i1ck230x9987c6q-git-2.27.0/bin:/nix/store/ldh4iwq6dxsv6cj78gv060wq9xa17n1h-gnutar-1.32/bin:/nix/store/yir8n5jfmcdmf0hyk712kl97krmdshlg-gzip-1.10/bin
ExecStart=/nix/store/7988ydmqdwk1p64sjysmi06bq523nbac-lorri/bin/lorri daemon
PrivateTmp=true
ProtectHome=read-only
ProtectSystem=strict
Restart=on-failure

[Unit]
After=lorri.socket
Description=lorri build daemon
RefuseManualStart=true
Requires=lorri.socket

Maybe this is more of a systemd/nixos bug? Maybe NixOS/nixpkgs#49528 is related here.

In the README it's suggested to do

 sudo systemctl --user daemon-reload

after adding lorri to the system environment. But I get this when typing in the command:

Failed to connect to bus: No such file or directory

(Using nixos-unstable and lorri installed is v1.2.0 if it matters)

I did not check if the whole thing works, but the fact that there's an error message when applying recipe from README is worrying.