micropython/micropython-esp32

Guru Meditation Error during socket.connect()

nickzoic opened this issue · 6 comments

If you try to socket.connect() to an unreachable TCP/IP address it eventually (~15 seconds) returns with
OSError: [Errno 113] EHOSTUNREACH

However, if you Ctrl-C during this time, the exception is immediately followed by a crash:

>>> s.connect(('10.107.1.6', 9999))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 113] EHOSTUNREACH
>>> NLR jump failed, val=0x3ffda814
Guru Meditation Error of type IllegalInstruction occurred on core  0. Exception was unhandled.

This occurs with network.WLAN() or the new network.LAN() adaptor.

(yes, I'll have a look at this but I'm adding it here so I don't forget)

Well, that explains recent problems I've had with WLAN.

Side Note, even machine.reset() gives Guru Meditation (Ah, my Amiga days), so there's something weird with the current IDF being used.

It looks like the ctrl-C that you press during connect() is being buffered. Then the EHOSTUNREACH exception is being raised, but the ctrl-C is still pending. The ctrl-C is then raised in some strange location which leads to the crash.

Apart from this being a bug (which may be difficult to track down the reason for), to fix connect() so that you can do ctrl-C to break out of it would require setting the socket to be non-blocking at the start, then do a loop polling for the connect() to complete. In that loop you can check for ctrl-C explicitly (by calling mp_handle_pending()).

Note: this stuff is already handled in esp8266 because it uses extmod/modlwip.c which wraps the lwIP stack at a lower level. And I don't think it's possible to hook into the esp32 lwIP stack at such a level, because it's probably not exposed and also there are multi-core issues to consider.

Same behaviour in v1.9.2-279-g090b6b80
Similar in v1.9.2-225-g75ead22c (no "Guru" message, but same "NLR jump failed")

@dpgeorge yeah, I was thinking that, we do similar things elsewhere in that library to "fake" timeouts.

Apart from this being a bug (which may be difficult to track down the reason for), to fix connect() so that you can do ctrl-C to break out of it would require setting the socket to be non-blocking at the start, then do a loop polling for the connect() to complete. In that loop you can check for ctrl-C explicitly (by calling mp_handle_pending()).

All of the other socket stuff implements blocking/timeout with a loop, for this same reason. I think connect doesn't do this, because it doesn't seem LWIP allows you to set the connect timeout.

In <IDF>/components/lwip/include/lwip/lwip/sockets.h

#define SO_CONTIMEO    0x1009 /* Unimplemented: connect timeout */

... and it's not implemented in the API, either. =(