player processes get stuck
fleimgruber opened this issue ยท 14 comments
player processes get stuck in "starting" state and no further playback is possible
๐ Bug report ๐
Description
After one or two alda play -c ...
the processes get stuck in the starting state and no further playback is possible.
Steps to Reproduce
- alda -v2 play -c "piano: c12 e g > c4"
- alda -v2 play -c "piano: c12 e g > c4"
- alda -v2 play -c "piano: c12 e g > c4"
- ...
Yes, but the number of stuck processes varies from 2-3.
Exact score does not matter, I tried with alda -v2 play -c "piano: c12 e g > c4"
and alda play -c "midi-bass-pop: c6 d12 e6 g12~4"
Expected Behavior
Sound plays consistently after each call to alda play ...
Actual Behavior
$ alda ps
id port state expiry type
itk 44295 starting 6 minutes from now player
ypb 37647 starting 7 minutes from now player
zxc 41825 starting 5 minutes from now player
$ alda -v2 play -c "piano: c12 e g > c4"
Mar 20 14:25:36 INF parser/parser.go:791 > Parsed input. filepath= took="373.503ยตs"
Mar 20 14:25:36 INF cmd/play.go:243 > Constructed score. took="45.406ยตs" updates=6
Starting player processes...
It looks like Alda is having trouble starting player processes in the
background. This could happen for a number of reasons.
To troubleshoot:
โข Run alda doctor and see if any of the health checks fail.
โข Run alda ps to see the state of any currently running player
processes.
โข Look for error messages in:
/home/fps/.cache/alda/logs/alda-player.log
โข Try running a player process in the foreground:
alda-player -v run -p 27278
โข Try to make it play something:
alda -v2 play -p 27278 -c "piano: c12 e g > c4"
Environment
Operating system and version:
GNU/Linux NixOS 21.11
Alda version:
$ alda version
alda 2.0.6
$ alda-player info
alda-player 2.0.6
log path: /home/fps/.cache/alda/logs
Health check:
$ alda doctor
OK Parse source code
OK Generate score model
OK Ensure that there are no stale player processes
OK Find an open port
OK Send and receive OSC messages
OK Locate alda-player executable on PATH
OK Check alda-player version
OK Spawn a player process
OK Ping player process
OK Play score
OK Export score as MIDI
OK Locate player logs
OK Player logs show the ping was received
OK Shut down player process
OK Spawn a player on an unknown port
OK Discover the player
OK Ping the player
OK Shut the player down
OK Start a REPL server
nREPL server started on port 41669 on host localhost - nrepl://localhost:41669
OK Find the REPL server
OK Interact with the REPL server
OK Shut down the REPL server
Logs:
104 โ clp INFO 2022-03-20 14:29:20 Main.run:77 - Starting receiver, listening on port 38477...
105 โ clp INFO 2022-03-20 14:29:20 MidiEngine.<init>:276 - Initializing MIDI sequencer...
106 โ clp INFO 2022-03-20 14:29:20 MidiEngine.<init>:281 - Initializing MIDI synthesizer...
107 โ clp INFO 2022-03-20 14:29:23 Main.run:90 - Starting player...
108 โ clp DEBUG 2022-03-20 14:29:23 Parser.parse:238 - received ping
109 โ clp DEBUG 2022-03-20 14:29:23 Player.invoke:346 - TRACK 1: startOffset is 0
110 โ clp DEBUG 2022-03-20 14:29:23 Player.invoke:349 - eraBefore: 0; eraAfter: 0
111 โ clp INFO 2022-03-20 14:29:23 Main.invoke:95 - Stopping receiver...
112 โ tta INFO 2022-03-20 14:29:27 StateManager.cleanUpStaleStateFiles:72 - Cleaning up stale files in /home/fps/.cache/alda/state/players...
113 โ tta INFO 2022-03-20 14:29:27 StateManager.cleanUpStaleStateFiles:72 - Cleaning up stale files in /home/fps/.cache/alda/state/repl-servers...
114 โ tta INFO 2022-03-20 14:29:27 Main.run:77 - Starting receiver, listening on port 46593...
115 โ tta INFO 2022-03-20 14:29:27 MidiEngine.<init>:276 - Initializing MIDI sequencer...
116 โ tta INFO 2022-03-20 14:29:27 MidiEngine.<init>:281 - Initializing MIDI synthesizer...
117 โ tta INFO 2022-03-20 14:29:30 Main.run:90 - Starting player...
118 โ tta DEBUG 2022-03-20 14:29:30 Parser.parse:238 - received ping
119 โ tta INFO 2022-03-20 14:29:30 Main.invoke:95 - Stopping receiver...
120 โ tta INFO 2022-03-20 14:29:30 Main.invoke:97 - Stopping player...
Hi @fleimgruber , thanks for reporting this!
From the logs you provided, it looks like a handful of player processes (itk
, ypb
and zxc
) recorded that they were in the starting
state and then they apparently died before they were able to log anything. ๐ค In the log, there are only entries for 2 player processes, clp
and tta
, and no error messages for those 2 processes.
I do think that Alda can self-correct to recover from a situation like this, but it takes 2 minutes. Every time you run an alda
command, it looks for "stale" player state files (which means any files in ~/.cache/alda/state/players/<VERSION>
that were last modified more than 2 minutes ago) and deletes them. Alda will then start new player processes to replace the stale/dead ones.
Would you mind running watch -n 0.25 alda ps
and keep an eye on the output, and observe what happens when your player processes get stuck? If you wait 2 minutes, do the starting
processes disappear from the alda ps
output?
Another thing that might be helpful is to monitor the actual processes on your system by running watch -n 0.25 pgrep -af alda-player
. If the number of player processes reported by alda ps
is more than the number of java
processes in the pgrep
output, that would confirm my theory above about the player processes dying without cleaning up their state files.
Thanks for taking a look at this!
watch -n 0.25 alda ps
Every 0.2s: alda ps
id port state expiry type
itk 44295 starting 4 hours ago player
ypb 37647 starting 4 hours ago player
zxc 41825 starting 4 hours ago player
alda play --code "midi-bass-pop: c6 d12 e6 g12~4"
does not change anything in alda ps
and prints the mentioned
Starting player processes...
It looks like Alda is having trouble starting player processes in the
background. This could happen for a number of reasons.
To troubleshoot:
...
watch -n 0.25 pgrep -af alda-player
9411 java -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -Xmx1024m -Xms256m -DlogPath=tmplog -jar /nix/store/04p5x5y0lrs4ppz9
9clnhx5ddkhvvv9q-alda-2.0.6/bin/.alda-player-wrapped run
9497 java -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -Xmx1024m -Xms256m -DlogPath=tmplog -jar /nix/store/04p5x5y0lrs4ppz9
9clnhx5ddkhvvv9q-alda-2.0.6/bin/.alda-player-wrapped run
9499 java -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -Xmx1024m -Xms256m -DlogPath=tmplog -jar /nix/store/04p5x5y0lrs4ppz9
9clnhx5ddkhvvv9q-alda-2.0.6/bin/.alda-player-wrapped run
it looks for "stale" player state files (which means any files in ~/.cache/alda/state/players/ that were last modified more than 2 minutes ago) and deletes them.
The associated player state files seem to get updated somehow and thus not deleted by the logic you mentioned:
ls -la ~/.cache/alda/state/players/2.0.6/itk.json
-rw-r--r-- 1 fps users 64 Mar 20 19:23 /home/fps/.cache/alda/state/players/2.0.6/itk.json
Interesting! So it looks like those 3 processes are not dying, rather, they are recording that they're in the starting
state and then apparently not getting past that point.
I'm not sure at the moment how that could happen ๐ค
Just to confirm: the first few times you ran alda play ...
, the playback worked?
Another thing worth trying: run alda shutdown
to send a "shutdown" message to each of the stuck player processes. I'm curious if that works. If it does, it means that the player processes are actually running, but they aren't updating their state files.
Just to confirm: the first few times you ran alda play ..., the playback worked?
It is not reliable, sometimes the playback only works for one or two of those and subsequent ones are silent. Sometimes they work a couple of times (usually if I leave ample time between them).
Another thing worth trying: run alda shutdown to send a "shutdown" message to each of the stuck player processes.
alda shutdown
does not shut down the "starting" processes.
It might have to do with NixOS, the way it wraps alda or some weird sound system interaction? See also https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/interpreters/alda/default.nix. If we don't find a lead or a problem with alda itself, I will open an issue there as well just to confirm it is not a NixOS bug...
alda shutdown
does not shut down the "starting" processes.
Is there any error or warning message? Can you try it with alda -v3 shutdown
?
It might have to do with NixOS, the way it wraps alda or some weird sound system interaction? See also https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/interpreters/alda/default.nix. If we don't find a lead or a problem with alda itself, I will open an issue there as well just to confirm it is not a NixOS bug...
I don't really understand NixOS. It could very well be the case that something weird is happening involving the way the Alda Nix package is set up. I figure it wouldn't hurt to at least open an issue with Nix, link to this issue and see if they have any ideas.
Is there any error or warning message? Can you try it with alda -v3 shutdown?
$ alda -v3 shutdown
Shutting down player processes.
Mar 21 18:42:26 INF cmd/shutdown.go:68 > Sent "shutdown" message to player process. player={"Expiry":1647885032161,"ID":"naj","Port":40575,"ReadError":null,"State":"starting"}
Mar 21 18:42:26 DBG cmd/root.go:53 > Waiting for background activity to complete. activity="clean up renamed executables"
$ alda ps
id port state expiry type
naj 40575 starting 7 minutes from now player
OK, so that looks like the client successfully sent the player the "shutdown" message, which means the player was listening and received the message. ๐ค
But then the player didn't shut down, and it remained in the starting state?
But then the player didn't shut down, and it remained in the starting state?
Yep:
$ alda ps
id port state expiry type
naj 40575 starting 5 minutes from now player
Might this give a hint activity="clean up renamed executables"
combined with NixOS handling of binary paths (basically symlinks to derivations, which roughly correspond to a system installation state, comparable to a git commit)?
That part is unrelated. It's an implementation detail of the way that alda update
works -- it renames the current executable (old version) of Alda and downloads the new version in its place. Then the next time you run Alda, it deletes the old, renamed executables.
I noticed another thing now: When I let it sit for a while, the timer counts backwards and beyond, for example:
$ alda ps
id port state expiry type
naj 40575 starting 50 seconds ago player
when a new shutdown is sent afterwards via alda -v3 shutdown
(the command output is the same) then the timer is reset to 8 minutes:
$ alda ps
id port state expiry type
naj 40575 starting 8 minutes from now player
maybe this gives an indication of which code path was hit.
Good find! Every time a player process receives any message, it starts the timer over. This is an inactivity timer designed to prevent player processes from running forever in the background and wasting your CPU.
Once the timer reaches 0, the process should shut itself down and when you run alda ps
again, you shouldn't see it there.
It seems like for some reason, the player process is having trouble getting past the starting
state and is not responding to "shutdown" messages. ๐ค
I tried updating to 2.2.0 and can't reproduce the issue there. I sent a PR to NixOS upstream so this might have been solved in newer alda along the way already.
Weird, but I'm happy that the issue is fixed!