Gops hangs if there are too many Go processes to examine
siebenmann opened this issue · 17 comments
If you have more than ten (I believe) Go processes, gops will hang in goprocess/gp.go's FindAll() function. This happens because the 'for .. range pss { ...}' loop attempts to obtain a limitCh token before starting the goroutine, but the goroutines only release their token after they have sent their reply to the found
channel. Since the found
channel is unbuffered and is only read from by the main code after the 'for .. range pss {}' loop finishes, this deadlocks if the number of goroutines would ever be limited.
I think either the entire 'for .. range pss { ... }' loop needs to be in a goroutine so that it doesn't block reading from the found
channel, or the limitCh
token must be released by the code before sending to found
. The former change seems easier and I think it's correct.
This is impacting me as well. Was able to get around it temporarily by upping the magic number to 100 without a problem, though that's obviously not the right fix.
@siebenmann @dbraley could you please check if this PR fixes this issue.
Purely on inspection I believe that this will fix the issue I'm experiencing, however I believe it will also reintroduce issue #118. Since limitCh
is now sent to in a new goroutine that has no other purpose, there's nothing to stop a flood of goroutines calling isGo()
and using too many file descriptors at once.
@siebenmann thanks for the insight. I have modified the code to use wg instead of limitCh.
The change introduced above (see #125) moves the whole for-loop into a go routine (also suggested in the issue description). Ignoring the whitespace changes (git diff -b
) is far the best way to understand the diff.
It seems to me the simplest fix would be to make found
buffered, with size concurrencyProcesses
.
I personally prefer to avoid buffered channels, but yeah -- it's possible as well.
UPD: See the comment below, where I explain why this actually isn't gonna work with concurrencyProcesses
buffer size.
Actually, did not initially notice that concurrencyProcess
buffer limit proposed by @josharian. Imho it isn't gonna help. Imagine that we have more than concurrencyProcess
Go processes. After the first batch of concurrencyProcess
workers we will fill the output buffer, and will get to exactly the same situation as we are in right now. The only correct way in this case is to create a buffer of future len(results)
, which is unknown, and the nearest known value would be len(pss)
.
But by that point, receiving from found
will have commenced. In any case, there's no need to speculate. That's what tests are for. :)
But reading from found
happens after the main for-loop. It will never get there, because it will get stuck in the same way it gets stuck now.
I see now. Using len(pss) would work.
@josharian I've refactored my cl and made it testable.
UPD: Removed "Please try with len(pss)
-- it will hang." Of course it will work. Sorry seems I have problems finishing reading sentences today (-;
@josharian Added a fix-123-buffered branch. Please take a look and feel free to try it out yourself.
UPD: And yes, surely it would work with len(pss)
as we've agreed earlier.
Oups, sorry, forgot to push the changes. Now it's there.
Also added a version, which creates less go-routines, and doesn't use buffered channels. Please take a look https://github.com/yarcat/gops/blob/fix-123-no-buffers/goprocess/gp.go (the branch is still based on a PR #125)
Sorry for the delay! Can confirm this fixed the problem for me as well, thanks @rakyll, @yarcat, @mseshachalam,
& @josharian !