cloudfour/lighthouse-parade

Why does sitemap.xml get crawled before the passed URL

Opened this issue · 0 comments

Ex:

$ lighthouse-parade htts://www.baptistjax.com
Created CSV file
Starting the crawl...
Crawled https://www.baptistjax.com/sitemap.xml [text/xml] (646288 bytes)
Crawled https://www.baptistjax.com/ [text/html; charset=utf-8] (289135 bytes)
Report is done for https://www.baptistjax.com/
Wrote report for https://www.baptistjax.com/
Crawled https://www.baptistjax.com/services [text/html; charset=utf-8] (246368 bytes)
Report is done for https://www.baptistjax.com/services
Wrote report for https://www.baptistjax.com/services
Crawled https://www.baptistjax.com/site-search [text/html; charset=utf-8] (244996 bytes)

Notice that `sitemap.xml' is crawled before the url I requested. Why? Maybe this is some internal logic of simplecrawler?

Possibly related to #3