Chapter 5 - webCrawler.py not working properly

Question

Chapter 5 - webCrawler.py not working properly

Opened this issue 5 years ago · 1 comments

I think this code is not working properly.
The result is very dependent on the number of threads you start.
The more threads you start, the more pages will be crawled.
I guess the problem is that the crawler threads finish due to empty queue and don't get back to work when there is new work in the queue.

Some results shown by the crawler when crawling https://tutorialedge.net

1 Thread: Total Number of Pages Visited 35
5 Threads: Total Number of Pages Visited 35
10 Threads: Total Number of Pages Visited 36
50 Threads: Total Number of Pages Visited 67
100 Threads: Total Number of Pages Visited 78

Answer 1 · 2021-12-24T01:58:05.000Z

Hi. You are close. The problem is not finishing due to empty queue, but due to duplicates. Here is detailed explanation: https://stackoverflow.com/questions/70468915/problems-with-webcrawler-implementation