lgraubner/sitemap-generator-cli

This will go around in circles crawling the same pages over and over.

superlowburn opened this issue · 3 comments

Please provide an example or describe the problem further. The crawler should ignore already fetched pages.

Hi,

./sitemap-generator -bq gbiz.org sitemap.xml --verbose

I repeatedly get the same URLs being crawled.
Too many to cutnpaste here.

Started a crawl and looks like it does what it should do. It only adds pages a single time. Could you paste an example URL which is added more than once?

Also you don't need the -b flag if you are entry is the home page. It matches all pages anyways.