scrool/xled

Bad File Descriptor when getting second device from xdiscover

wackazong opened this issue · 15 comments

When using xdiscover and having two devices on the network, the following example code causes an exception on the second next call

    devices = []
    discovered_devices = xled.discover.xdiscover()
    while True:
        try:
            device = next(discovered_devices)
            devices.append(device)
        except StopIteration:
            break
    return devices
ERROR:asyncio:Exception in callback BaseAsyncIOLoop._handle_events(24, 1)
handle: <Handle BaseAsyncIOLoop._handle_events(24, 1)>
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "/Users/alex/.local/share/virtualenvs/wakeupcall-EJzfLl60/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 138, in _handle_events
    handler_func(fileobj, events)
  File "/Users/alex/.local/share/virtualenvs/wakeupcall-EJzfLl60/src/xled/xled/discover.py", line 442, in handle_beacon
    data, host = self._next_packet()
  File "/Users/alex/.local/share/virtualenvs/wakeupcall-EJzfLl60/src/xled/xled/discover.py", line 421, in _next_packet
    data, host = self.udp.recv(64)
  File "/Users/alex/.local/share/virtualenvs/wakeupcall-EJzfLl60/src/xled/xled/udp_client.py", line 110, in recv
    buf, addrinfo = self.handle.recvfrom(bufsize)
OSError: [Errno 9] Bad file descriptor

If I comment out the line

self.stop()

then the generator returns all devices but never returns.

This indeed looks like a bug and a fix that you have proposed might be the correct way to go. I need to find out if we wouldn't end up with loop that never ends though.s

On the other hand the snippet you have provided indeed never ends - there is nothing that would raise StopIteration. You'll have to handle that on your own for now - e.g. end the loop after some timeout or look at discovered devices and end when you find all of those you were looking for.

Please try latest changes in develop branch.

On the top of that if you would like to get all devices on the network you can now use timeout like this:

    devices = []
    discovered_devices = xled.discover.xdiscover(timeout=3)
    while True:
        try:
            device = next(discovered_devices)
            devices.append(device)
        except StopIteration:
            break
    return devices

And in devices you'll have all devices that responded until 3 seconds.

Thanks for the quick reaction! In the example you give a DiscoverTimeout Exception is raised when the timeout is reached. This is not taken into account in the example and the execution is halted. Why are you raising the Exception instead of just returning when the timeout is reached?

Like this it works nicely. Thanks for that!

    devices = []
    discovered_devices = xled.discover.xdiscover(timeout=3)
    while True:
        try:
            device = next(discovered_devices)
            devices.append(device)
        except StopIteration:
            break
        except DiscoverTimeout:
            break
    return devices

Why are you raising the Exception instead of just returning when the timeout is reached?

I'd like caller to get a reason why xdiscover returned. It could also fail with error.

If there is no device at all the call next(discovered_devices still hangs.

If there is no device at all the call next(discovered_devices still hangs.

With a timeout? I cannot reproduce that. If I don't have my Twinkly turned on and I run the same code I get DiscoverTimeout exception.

Yes, with timeout set. Ok, I will try and put together a test case.

I can reproduce it with the script above.

Stack trace when I interrupt is:

^CTraceback (most recent call last):
  File "twinkly/test_timeout.py", line 8, in <module>
    device = next(discovered_devices)
  File "/home/pi/wakeupcall/.venv/src/xled/xled/discover.py", line 74, in xdiscover
    response = interface.recv()
  File "/home/pi/wakeupcall/.venv/src/xled/xled/discover.py", line 199, in recv
    return self.pipe.recv_multipart()
  File "/home/pi/wakeupcall/.venv/lib/python3.7/site-packages/zmq/sugar/socket.py", line 475, in recv_multipart
    parts = [self.recv(flags, copy=copy, track=track)]
  File "zmq/backend/cython/socket.pyx", line 791, in zmq.backend.cython.socket.Socket.recv
  File "zmq/backend/cython/socket.pyx", line 827, in zmq.backend.cython.socket.Socket.recv
  File "zmq/backend/cython/socket.pyx", line 186, in zmq.backend.cython.socket._recv_copy
  File "zmq/backend/cython/checkrc.pxd", line 12, in zmq.backend.cython.checkrc._check_rc
KeyboardInterrupt

It's always the same stack trace.

I'm wondering if you can test this on the top of latest changes?

Hm, when I run the test case with 0.6.1 installed and no devices on the network the script still does not return.

Stack trace when I interrupt:

    ^CTraceback (most recent call last):
      File "test_xled.py", line 9, in <module>
        device = next(discovered_devices)
      File "/home/pi/wakeupcall/.venv/lib/python3.6/site-packages/xled/discover.py", line 74, in xdiscover
        response = interface.recv()
      File "/home/pi/wakeupcall/.venv/lib/python3.6/site-packages/xled/discover.py", line 199, in recv
        return self.pipe.recv_multipart()
      File "/home/pi/wakeupcall/.venv/lib/python3.6/site-packages/zmq/sugar/socket.py", line 583, in recv_multipart
        parts = [self.recv(flags, copy=copy, track=track)]
      File "zmq/backend/cython/socket.pyx", line 781, in zmq.backend.cython.socket.Socket.recv
      File "zmq/backend/cython/socket.pyx", line 817, in zmq.backend.cython.socket.Socket.recv
      File "zmq/backend/cython/socket.pyx", line 186, in zmq.backend.cython.socket._recv_copy
      File "zmq/backend/cython/checkrc.pxd", line 13, in zmq.backend.cython.checkrc._check_rc
    KeyboardInterrupt

Python 3.6.3, using Poetry Package Manager.
Skript used:

    import xled
    from xled.exceptions import DiscoverTimeout
    devices = []
    discovered_devices = xled.discover.xdiscover(timeout=3)
    while True:
        print('trying')
        try:
            device = next(discovered_devices)
            print('got device')
            devices.append(device)
        except StopIteration:
            break
        except DiscoverTimeout:
            break
    print(devices)

When I have devices on the network the script returns and works as expected.

Hm, when I run the test case with 0.6.1 installed and no devices on the network the script still does not return.

From what I can tell Bad File Descriptor is no longer the case here. I'm not able to reproduce infinite loop though.

I'm using separate virtual environment on the top of latest commit in develop:

$ git describe
v0.6.1-42-g0e22aaa
$ python3.6 -m venv ~/.virtualenvs/xled@py36      
$ . ~/.virtualenvs/xled@py36/bin/activate
$ python --version
Python 3.6.12
$ python setup.py develop
...
error: The 'requests' distribution was not found and is required by xled

I think this is chicken-egg problem with pip so for now I'm dealing with this by just second run:

$ python setup.py develop
...
Finished processing dependencies for xled==0.6.1
$ pip freeze
certifi==2020.12.5
cffi==1.14.4
chardet==4.0.0
click==8.0.0a1
click-log==0.3.2
cryptography==3.3.1
idna==2.10
importlib-resources==5.1.0
netaddr==0.8.0
pycparser==2.20
pyzmq==22.0.2
requests==2.25.1
requests-toolbelt==0.9.1
six==1.15.0
tornado==6.1
urllib3==1.26.3
-e git+git@scrool.github.com:scrool/xled.git@0e22aaa4ae90e678b045d934a816c2cfff53d11d#egg=xled
zipp==3.4.0
$ cat test_28.py
import xled
from xled.exceptions import DiscoverTimeout
devices = []
discovered_devices = xled.discover.xdiscover(timeout=3)
while True:
    print('trying')
    try:
        device = next(discovered_devices)
        print('got device')
        devices.append(device)
    except StopIteration:
        break
    except DiscoverTimeout:
        break
print(devices)

a) with no devices:

$ timeout -v 10 /usr/bin/time -p python test_28.py
trying
[]
real 4.58
user 0.48
sys 0.06

it ends later than 3 seconds because of python startup overhead but in less than 10 seconds after timeout would kill the script.

b) with one device:

$ timeout -v 10 /usr/bin/time -p python test_28.py
trying
got device
trying
[DiscoveredDevice(hw_address='xx:xx:xx:xx:xx:xx', id='Twinkly_XXXXXX', ip_address='192.168.1.X')]
real 3.66
user 0.52
sys 0.07

c) with two devices:

$ timeout -v 10 /usr/bin/time -p python test_28.py
trying
got device
trying
got device
trying
[DiscoveredDevice(hw_address='xx:xx:xx:xx:xx:xx', id='Twinkly_XXXXXX', ip_address='192.168.1.X'), DiscoveredDevice(hw_address='yy:yy:yy:yy:yy:yy', id='Twinkly_YYYYYY', ip_address='192.168.1.Y')]
real 3.61
user 0.53
sys 0.07

Please (re)install dependencies if possible and list your packages and their versions with pip freeze.

I have multiple devices now and I haven't been able to reproduce this issue nor seen it occurring randomly. I'm going to close this one out.