Sites should request an index before attempting to do the http crawl

Question

Sites should request an index before attempting to do the http crawl

Closed this issue 12 years ago · 0 comments

When Distru is given the command to index a site, it pays no attention to whether or not the site is running Distru. If the target site (except for the self-indexing) responds to a (possibly only-the-target-site Index.MergeRemote()) distru request, then it should not be actively crawled.

This will drastically reduce the amount of http network traffic, if enough sites run Distru instances.