Why does sitemap.xml get crawled before the passed URL
Opened this issue · 0 comments
emersonthis commented
Ex:
$ lighthouse-parade htts://www.baptistjax.com
Created CSV file
Starting the crawl...
Crawled https://www.baptistjax.com/sitemap.xml [text/xml] (646288 bytes)
Crawled https://www.baptistjax.com/ [text/html; charset=utf-8] (289135 bytes)
Report is done for https://www.baptistjax.com/
Wrote report for https://www.baptistjax.com/
Crawled https://www.baptistjax.com/services [text/html; charset=utf-8] (246368 bytes)
Report is done for https://www.baptistjax.com/services
Wrote report for https://www.baptistjax.com/services
Crawled https://www.baptistjax.com/site-search [text/html; charset=utf-8] (244996 bytes)
Notice that `sitemap.xml' is crawled before the url I requested. Why? Maybe this is some internal logic of simplecrawler?
Possibly related to #3