Crawler stuck on async queue.get()

Question

Crawler stuck on async queue.get()

YNX940214 opened this issue 4 years ago · 1 comments

Running crawler/code/crawl.py with python crawler.py xkcd.com --max_tasks 3 -v, it's stuck after crawling one or two urls, after some logging and debugging. It_ seems that the program got stuck on url, max_redirect = yield from self.q.get(), there are plenty of elements in the queue, but it won't trigger.
My environment is Windows 10, python3.8.

Answer 1 · 2020-12-17T15:41:22.000Z

I find this was caused by an exception raise by Crawler.fetch() yield from response.release(). I printed the Exception, it says: 'noop' object is not iterable. But I can't figure out why this is an exception, the yield from response.release() just return a noop. Also, why the uncaught Exception just dispeared(it was launched in a asyncio.Task)? It should stop the process but it didn't, so the program just looks like stucked. Can someone make some explanations about the two questions?