emacs-jupyter/jupyter

help debugging why emacs-jupyter has stopped working

Opened this issue · 5 comments

I have been using emacs-jupyter in org-mode for a long time, but sometime in the last week it stopped working, I suspect after updating MacOS to Sequoia, although it works fine on another Mac where I did that.

It seems to just time out while requesting kernel info. I can see a python process running in the Activity Monitor. eventually though it either hangs, or dies with a message like "EINTR:", or cl--assertion-failed: Assertion failed: (jupyter-alive-p kernel). I tried rebuilding the zmq library, and updating jupyter, and emacs, but it has not helped.

The problem seems to be in jupyter--start-kernel-process, but I am not sure how to proceed with fixing it.

I am running GNU Emacs 30.0.91 (build 1, x86_64-apple-darwin24.0.0, NS appkit-2566.00 Version 15.0 (Build 24A335)) of 2024-10-01.

Jupyter works fine to launch jupyter lab.

jupyter --version
Selected Jupyter core packages...
IPython          : 8.15.0
ipykernel        : 6.25.0
ipywidgets       : 8.0.4
jupyter_client   : 7.4.9
jupyter_core     : 5.3.0
jupyter_server   : 2.12.1
jupyterlab       : 4.2.5
nbclient         : 0.10.0
nbconvert        : 7.16.4
nbformat         : 5.9.2
notebook         : 7.2.1
qtconsole        : 5.4.2
traitlets        : 5.7.1

Any ideas?

You can launch a kernel and, even if you get the "assertion failed", you should still be able to access the kernel process' stdout to see if there are any errors that happened to popup before it was killed. That assertion is essentially
(process-live-p (jupyter-process kernel)) so it seems that the kernel process is dying before fully completing the launch process.

To just launch a kernel (without sending any messages to it):

(setq kernel (jupyter-kernel
              :spec (jupyter-guess-kernelspec "python")))
(jupyter-launch kernel) ; Don't worry about the `jupyter-alive-p` assertion if it happens

and then to get to the process buffer:

(pop-to-buffer (process-buffer (jupyter-process kernel))) 

To delete the kernel process

(jupyter-shutdown kernel)

Not sure how the EINTR error is being raised. Do you get that as part of a zmq error, i.e. do you get "Error in ZMQ subprocess" in the minibuffer?

Since you are building the zmq library manually, could you go into it's folder and run the tests with make test to see if all the tests pass.

well good news, the zmq tests pass (They are not included in the elpa package I guess, I had to get them from the repo).

I am not sure how I saw the EINTR message, I haven't been seeing it in these tests. Maybe it isn't relevant.

I have been able to run some of the codes you suggested.

This worked I think:

#+BEGIN_SRC emacs-lisp
(setq kernel (jupyter-kernel
              :spec (jupyter-guess-kernelspec "python")))
(jupyter-launch kernel) ; 
#+END_SRC

without error once, but continuing to try to figure out what is happening I am getting errors like this

zmq.error.ZMQError: Address already in use (addr='tcp://127.0.0.1:51955')

I can see there are some python processes running, and killing them doesn't seem to help or release the port, there is just a new port already in use. I guess this is an error on the Python side. Here is the full traceback.

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/jkitchin/anaconda3/lib/python3.11/site-packages/ipykernel_launcher.py", line 17, in <module>
    app.launch_new_instance()
  File "/Users/jkitchin/anaconda3/lib/python3.11/site-packages/traitlets/config/application.py", line 991, in launch_instance
    app.initialize(argv)
  File "/Users/jkitchin/anaconda3/lib/python3.11/site-packages/traitlets/config/application.py", line 113, in inner
    return method(app, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jkitchin/anaconda3/lib/python3.11/site-packages/ipykernel/kernelapp.py", line 689, in initialize
    self.init_sockets()
  File "/Users/jkitchin/anaconda3/lib/python3.11/site-packages/ipykernel/kernelapp.py", line 328, in init_sockets
    self.shell_port = self._bind_socket(self.shell_socket, self.shell_port)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jkitchin/anaconda3/lib/python3.11/site-packages/ipykernel/kernelapp.py", line 252, in _bind_socket
    return self._try_bind_socket(s, port)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jkitchin/anaconda3/lib/python3.11/site-packages/ipykernel/kernelapp.py", line 228, in _try_bind_socket
    s.bind("tcp://%s:%i" % (self.ip, port))
  File "/Users/jkitchin/anaconda3/lib/python3.11/site-packages/zmq/sugar/socket.py", line 302, in bind
    super().bind(addr)
  File "zmq/backend/cython/socket.pyx", line 564, in zmq.backend.cython.socket.Socket.bind
  File "zmq/backend/cython/checkrc.pxd", line 28, in zmq.backend.cython.checkrc._check_rc
zmq.error.ZMQError: Address already in use (addr='tcp://127.0.0.1:51982')

I am not sure what I could have done to cause this.

It could be because, in order to get a set of open ports on a system, I essentially shell out to the jupyter kernel subcommand solely for the purpose of reading the ports that it uses for the channels of the kernel that it creates, mainly to support launching kernels on remote systems where it is not easy or convenient to determine a set of open ports (see jupyter-session-with-random-ports). Before proceeding to launch the actual kernel, the one created by the jupyter kernel command is killed which should release the ports so that the actual kernel can use them.

I'm not sure if killing the jupyter kernel process before it had shutdown gracefully has anything to do with it. The logic of jupyter-session-with-random-ports to remove the process is essentially:

  1. Call interrupt-process (cause the process to run its shutdown procedure)
  2. Wait until the connection file it created has been deleted
  3. Call delete-process (send SIGKILL)

I did find a bug that (2) wasn't actually waiting until the connection file was deleted before going to (3) so there is a chance that the process could not have terminated gracefully, I think. But I don't know if that means the ports would still be considered in use even if the process that opened them has been terminated. Not sure why the delete-process would be needed since the interrupt-process already tells the jupyter kernel process to shutdown.

Just for testing purposes, could you fiddle with how long to wait between a call to jupyter-session-with-random-ports and the rest of the kernel launch process. You could redefine the following with a longer sleep-for:

(cl-defmethod jupyter-launch :before ((kernel jupyter-kernel-process))
  (pcase-let (((cl-struct jupyter-kernel-process session) kernel))
    (unless session
      (setf (jupyter-kernel-session kernel) (jupyter-session-with-random-ports))
      (sleep-for 1.0))))

This didn't seem to change anything, at least up to 5 seconds. It seems to hang on Requesting kernel info.... (jupyter-session-with-random-ports) seems to work fine. It is really weird...

I finally saw EINTR again. This happened from just typing C-c C-c to start an org-block. This is all that was in the messages.

Executing Jupyter-Python unknown at position 1005...
Launching python3 kernel...
Starting python3 kernel process...done
Launching python3 kernel...done
Requesting kernel info...
Loading /Users/jkitchin/.emacs.d/elpa/zmq-20240716.2000/emacs-zmq (module)...done
EINTR: "Interrupted system call"

I tried running

#+BEGIN_SRC emacs-lisp
(pop-to-buffer (process-buffer (caar jupyter--kernel-processes)))
#+END_SRC

but didn't see anything out of the ordinary:

NOTE: When using the `ipython kernel` entry point, Ctrl-C will not work.

To exit, you will have to explicitly quit this process, by either sending
"quit" from a client, or using Ctrl-\ in UNIX-like environments.

To read more about this, see https://github.com/ipython/ipython/issues/2049


To connect another client to this kernel, use:
    --existing emacs-kernel-okpUB7.json

now it gets a little weirder. I tried running the src block, and it worked, and for a reason that is unknown to me, it seems to be working again. so weird. it seems to have survived restarting emacs. I do not know what I did to fix it. It could be some updated packages in emacs, or in brew. I may never know. Thanks for the tips on getting in to the internals though! I will leave this open for a bit in case something comes back, but if I don't add to it in a few days or you want to close it, go ahead.