jupyter-xeus/xeus-zmq

Possible channel polling issue after update to xeus-zmq 3.0, no shell messages

Closed this issue · 7 comments

I'm running a Qt based application hosted xeus server with python kernel (similar to Slicers usage of xeus). Polling of channels is triggered by a QTime at 10ms intervals.

  m_pollTimer = new QTimer();
  m_pollTimer->setInterval(10);
  QObject::connect(m_pollTimer, &QTimer::timeout, [=]() { 

      auto msg = poll_channels(10); 
      if (msg)
      {
          if (msg.value().second == xeus::channel::SHELL)
          {
              notify_shell_listener(std::move(msg.value().first));
          }
          else
          {
              notify_control_listener(std::move(msg.value().first));
          }
      }
  });

This worked with the previous versions before these updates were applied:

xeus 3.1.4 -> 5.1.0
xeus-zmq 1.1.1 -> 3.0.0
xeus-python 0.15.2 -> 0.17.0

Code was adjusted to the new api (e.g. xserver_zmq::poll became xserver_zmq::poll_channels)

However, now, messages come in only on the control channel but not on the shell channel. This despite the fact that Wireshark clearly shows shell channel (on the shell port) messages being sent. (The Jupyter server and kernel are on localhost.)

As a result the code execution handler execute_request_impl (or other shell functions) is never called.

There is a comment in the xserver_zmq.hpp file about the need for some functions to be called in the same thread:

        // The following methods must be called in the same thread
        using message_channel = std::pair<xmessage, channel>;
        std::optional<message_channel> poll_channels(long timeout);
        void send_shell_message(xmessage msg);
        void send_control_message(xmessage msg);

Could my issue be related to this?

P.S. I also tried following the example in xeus-qt where a separate poller thread is used and the messaged returned by signals but the behaviour was the same.

Despite severals days of debugging I coudn't get to the bottom of this.

Everything is now rolled back to versions slighty newer than the previous setup (but before the major version changes) and it all works again with these versions:

xeus 3.2.0
xeus-xmq 1.3.0
xeus-python 0.16.0

Code was adjusted to the new api (e.g. xserver_zmq::poll became xserver_zmq::poll_channels)

I just had a quick look at the implementation for both methods and they look pretty similar to me. So I'm not sure the change in the API should introduce any errors.

Rather we also implement the start_impl method (basically to start the xserver from xeus) in a similar way as you've written above

void xserver_zmq_default::start_impl(xpub_message msg)
{
start_publisher_thread();
start_heartbeat_thread();
set_request_stop(false);
publish(std::move(msg), channel::SHELL);
while(!is_stopped())
{
auto msg = poll_channels(-1);
if (msg)
{
if (msg.value().second == channel::SHELL)
{
notify_shell_listener(std::move(msg.value().first));
}
else
{
notify_control_listener(std::move(msg.value().first));
}
}
}
stop_channels();
}

I haven't dug deeper in your issue but at a surface level things looks OK to me.

There is a comment in the xserver_zmq.hpp file about the need for some functions to be called in the same thread:
[...]
Could my issue be related to this?

I don't think so, this is to avoid ending with two different threads writing to the same ZMQ socket, which usually results in exceptions or crashes.

Is the code of your kernel available somewhere? We have different kernels based on xeus-zmq that works out of the box, including the example in xeus-qt, so I'm pretty sure we are missing something.

Ok I think I found the culprit; see here and here. This would explain why you receive control messages but no shell messages. The defualt implementation works thanks to this line but this is very error prone and should not be required.

Can you try adding this line in your codebase and check if it fixes the issue? I need to check if default initializing this variable to true does not break anything else. If not, I will publish a patched version to avoid that.

Thanks @JohanMabille! I'll give that another go.

By the way the jupyter-xeus infrastructure is a major help to integrators like myself. I'd really like to see this work get more publicity.

Thanks! Happy that this stack helps, we will advertise more when the upgrade to xeus 5 (with suport for async execution) is done.