TODO Good logging library, instead of print(flush=True) Crawler Honor robots.txt Limit max response size (2MB for example) Better filtering of unwanted urls