mafredri/zsh-async

Using zpty for other project, noticed a problem

psprint opened this issue · 2 comments

I use zpty to run services like py-http in background, with Zplugin (zservices org gives a picture of what it is). After a few hours of 2 shells running, one of them having spawned 2 services, an additional zsh -i ... process appeared, consuming 100% CPU, not having any childs. I tried to investigate what's that process. I'm pretty sure there's no code path that could spawn new process, but this can likely to be revised. However the process responded to zpty -d zservices:redis by shutting down, so it looks like zpty spawned something?

So my question is have you occurred something like this when working with zpty?

What version of zsh are you seeing this on? I can't say I've run into this exact issue myself, but I've seen some weird behavior with zptys from time to time. Some time ago I did see "dead"/crashed zpty processes take up 100% cpu, but IIRC this was due to stdin/stdout being broken causing read to be non-blocking (I think that issue was fixed with this change:

zsh-async/async.zsh

Lines 138 to 145 in 001f40e

read -r -d $'\0' request || {
# Since we handle SIGHUP above (and thus do not know when `zpty -d`)
# occurs, a failure to read probably indicates that stdin has
# closed. This is why we propagate the signal to all children and
# exit manually.
kill -HUP -$$ # Send SIGHUP to all jobs.
exit 0
}
).

Is there something in your code that could cause the process to be spawned at a later time? Consider that a zpty is forked from the current shell at that point. Say that a zsh/sched was scheduled for 2h from now, this might also be triggered inside the zpty instance.

Are you able to reproduce this on a smaller scale or can you only trigger it when using your full shell configuration?

It turned out to be a similar issue to the read on stdin in zsh-async. I'm using read -t 1 on a fifo to wait, instead of sleep, to not change terminal's title to 'sleep'. zsystem flock has a bug (sent fix to zsh-workers), it doesn't close descriptor on unsuccessful lock. After a few hours, OS X was out of descriptors, and read ... <> .../fifo wasn't waiting because it couldn't redirect to the fifo. So loop was on full speed, taking 100% of CPU time. I had to misinterpret pstree earlier, thinking it's a new process. Ran updated code using subshell that closes on unsuccessful lock for 12 hours and the issue didn't appear. So zpty wasn't the cause here :)