openlawlibrary/pygls

Infinite loop when not exited properly

Closed this issue · 4 comments

Initially found this issue by using jedi-language-server and killing emacs with SIGKILL.

I think it boils down to a simple reproducer by using the pygls json example:

# cd examples/json-extension/
# cat bug.py 
import subprocess

subprocess.run(
    ["python3", "-m", "server"], stdin=subprocess.PIPE, stdout=subprocess.PIPE
)

Then start it python3 run.py, and kill it pkill -9 -f bug.py, by doing so, pygls hits 100% CPU (see strace below).

My initial reproducer with emacs and jedi-language-server was:

  • Start emacs 27.1 with lsp-jedi==20200812.1826 and lsp-mode==20210410.1801, open a Python file to start jedi-language-server
  • Violently kill -9 the emacs to let no chance to emacs to notify, kill, or wait jedi-language-server
  • Now orphan jedi-language-server process gets adopted by init
  • jedi-language-server enters an infinite loop at 100% CPU

I straced jedi-language-server at 100% cpu and noticed a short loop of like 30 syscalls, I then straced pygls json example and the strace looks exactly the same:

strace of the loop
221210 recvfrom(4, "\0", 4096, 0, NULL, NULL) = 1
221210 recvfrom(4, 0x32064d0, 4096, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
221210 epoll_wait(3, [], 1, 0)          = 0
221210 stat("/usr/lib/python3.9/asyncio/events.py",  <unfinished ...>
221210 <... stat resumed>{st_mode=S_IFREG|0644, st_size=26378, ...}) = 0
221210 stat("/home/mdk/.local/lib/python3.9/site-packages/pygls/server.py",  <unfinished ...>
221210 <... stat resumed>{st_mode=S_IFREG|0644, st_size=12840, ...}) = 0
221210 stat("/home/mdk/.local/bin/jedi-language-server", {st_mode=S_IFREG|0755, st_size=221, ...}) = 0
221210 stat("/usr/lib/python3/dist-packages/click/core.py", {st_mode=S_IFREG|0644, st_size=77650, ...}) = 0
221210 stat("/home/mdk/.local/lib/python3.9/site-packages/jedi_language_server/cli.py", {st_mode=S_IFREG|0644, st_size=292, ...}) = 0
221210 stat("/usr/lib/python3.9/asyncio/futures.py", {st_mode=S_IFREG|0644, st_size=14018, ...}) = 0
221210 stat("/usr/lib/python3.9/asyncio/base_events.py", {st_mode=S_IFREG|0644, st_size=73809, ...}) = 0
221210 stat("/usr/lib/python3.9/asyncio/events.py", {st_mode=S_IFREG|0644, st_size=26378, ...}) = 0
221210 stat("/home/mdk/.local/lib/python3.9/site-packages/pygls/server.py", {st_mode=S_IFREG|0644, st_size=12840, ...}) = 0
221210 stat("/usr/lib/python3.9/asyncio/futures.py", {st_mode=S_IFREG|0644, st_size=14018, ...}) = 0
221210 stat("/usr/lib/python3.9/asyncio/base_events.py", {st_mode=S_IFREG|0644, st_size=73809, ...}) = 0
221210 sendto(5, "\0", 1, 0, NULL, 0)   = 1
221210 epoll_wait(3, [{EPOLLIN, {u32=4, u64=4}}], 1, 0) = 1
221210 stat("/usr/lib/python3.9/asyncio/events.py", {st_mode=S_IFREG|0644, st_size=26378, ...}) = 0
221210 stat("/home/mdk/.local/lib/python3.9/site-packages/pygls/server.py", {st_mode=S_IFREG|0644, st_size=12840, ...}) = 0
221210 stat("/home/mdk/.local/lib/python3.9/site-packages/jedi_language_server/cli.py", {st_mode=S_IFREG|0644, st_size=292, ...}) = 0
221210 stat("/usr/lib/python3.9/asyncio/futures.py", {st_mode=S_IFREG|0644, st_size=14018, ...}) = 0
221210 stat("/usr/lib/python3.9/asyncio/base_events.py", {st_mode=S_IFREG|0644, st_size=73809, ...}) = 0
221210 stat("/usr/lib/python3.9/asyncio/events.py", {st_mode=S_IFREG|0644, st_size=26378, ...}) = 0
221210 stat("/home/mdk/.local/lib/python3.9/site-packages/pygls/server.py", {st_mode=S_IFREG|0644, st_size=12840, ...}) = 0
221210 stat("/home/mdk/.local/lib/python3.9/site-packages/jedi_language_server/cli.py", {st_mode=S_IFREG|0644, st_size=292, ...}) = 0
221210 stat("/usr/lib/python3.9/asyncio/futures.py", {st_mode=S_IFREG|0644, st_size=14018, ...}) = 0
221210 stat("/usr/lib/python3.9/asyncio/base_events.py", {st_mode=S_IFREG|0644, st_size=73809, ...}) = 0

Looks like stdin.readline does not throw BrokenPipeError here. Do you have any idea how to check if PIPE is still open?

Just curious, does this happen with VS Code? They use child processes, not sure if those are killed when the main process is forcibly killed.

Looks like stdin.readline does not throw BrokenPipeError here. Do you have any idea how to check if PIPE is still open?

As far as I know, readline will return an empty bytes instance when the file is closed or pipe is broken. On all other "lines" the newline is included so the result is not empty, allowing to take appart empty lines and end of files.

I tried using loop.connect_read_pipe and loop.connect_write_pipe but couldn't make it work yet. That way connection_lost method of protocol would be triggered when a pipe is broken, and we could also reuse reading messages logic in data_received.

I think we would need to fix the custom transport and pass it to the protocol manually, as we are doing already.

It sounds like @JulienPalard is right and checking for a total lack of data returned is an indication the pipe is closed:

https://stackoverflow.com/questions/49501263/python-and-asyncio-closed-named-pipe-always-available-for-reading