Queries intermittently freezing asyncio event loop
davidmcnabnz opened this issue · 3 comments
Most of the time, aiodns is fine. But on rare occasions, it gets stuck on a C write()
call deep within pycares.
This freezes the entire event loop indefinitely, because the write()
call never returns.
My original calling code is like:
dns = aiodns.DNSResolver()
reply = await dns.query(somedomain, 'MX')
For now, I'll look at workarounds like moving all my aiodns
queries off to separate threads, but this seems to be inefficient.
But I'd welcome some advice on this.
Below is the py-spy stack trace of where the aiodns call is getting stuck.
Thread 380494 (idle): "MainThread"
write (libpthread-2.31.so)
_Py_DECREF (object.h:422)
_my_PyErr_WriteUnraisable (_cffi_backend.c:6113)
general_invoke_callback (_cffi_errors.h:147)
gil_release (misc_thread_common.h:370)
cffi_call_python (call_python.c:278)
_sock_state_cb (_cares.c:998)
open_udp_socket (ares_process.c:1240)
ares__send_query (ares_process.c:854)
ares_send (ares_send.c:131)
ares_query (ares_query.c:138)
_cffi_f_ares_query (_cares.c:3287)
_do_query (pycares/__init__.py:581)
query (pycares/__init__.py:561)
query (aiodns/__init__.py:90)
The nature of this issue means that using asyncio timeout wrappers cannot work, because once the thread's event loop is stuck inside a C function call, there's no way for a TimeoutError
to get thrown up to the wrapper.
I've also filed an issue with the pycares
tracker:
What a weird one!
Drilling down, what happens is pycares got some activity on a file descriptor and called the socket state callback, which aiodns uses:
Line 137 in 1c5f28f
Here is where pycares calls is: https://github.com/saghul/pycares/blob/de2ed40596f543f989bbcea30632be751133c110/src/pycares/__init__.py#L97
Something seems to happen which causes an unraiseable error: _my_PyErr_WriteUnraisable (_cffi_backend.c:6113)
and then it's the call to wirte it to standard out which seemingly gets stuck.
Very weird.
On the pycares issue you seem to be using 4.2 which is an older release. Can you please test with the latest version of both packages?
Also, a repro script, even if it takes ours would be useful.