alexander-bauer/distru

Sites should request an index before attempting to do the http crawl

Closed this issue · 0 comments

When Distru is given the command to index a site, it pays no attention to whether or not the site is running Distru. If the target site (except for the self-indexing) responds to a (possibly only-the-target-site Index.MergeRemote()) distru request, then it should not be actively crawled.

This will drastically reduce the amount of http network traffic, if enough sites run Distru instances.