Starting crawl from subdirectory
Closed this issue · 4 comments
Details
When I run a command for --site https://site/subdirector on my mac, everything works as I'd like; starting with that page, doesn't find a sitemap file, so falls back to crawling from https://site/subdirector but on a windows machine, the crawling starts at the domain https://site.
Is there a configuration that I can force it to start at the subdirectory? I tried -include /subdirector/.* but that doesn't seem to do it. With that, it just hangs.
Debug shows this "GET /api/reports 200 object - 0ms" repeating over and over.
Mac:
Successfully connected to https://teamsideline.com/Layouts/minimalist/Home.aspx?d=ZHcj%2bsPHK5g%2bZkLyQaVo0Q%3d%3d/, status code: 200. unlighthouse 07:50:32
───────────────────────────────────────────────────╮
│ │
│ ⛵ unlighthouse cli @ v0.5.0 │
│ │
│ ▸ Scanning: https://teamsideline.com/Layouts/minimalist/Home.aspx?d=ZHcj%2bsPHK5g%2bZkLyQaVo0Q%3d%3d/ │
│ ▸ Route Discovery: Crawler
Windows:
Successfully connected to https://teamsideline.com/. (Status: 200). Unlighthouse 2:50:40 PM
─────────╮
│ │
│ ⛵ Unlighthouse cli @ v0.11.4 │
│ │
│ ▸ Scanning: https://teamsideline.com/ │
│ ▸ Route Discovery: Crawler
I notice this works with unlighthouse@0.5.1 but not 0.6.0 or after.
--include-urls does not solve this issue. It hangs the same as the original issue.
Hi @Robanna777, thanks for the issue.
Seems like this wasn't supported and worked by accident in earlier versions. I've pushed up a fix for it, you can use it as:
npx unlighthouse@0.11.5 --site https://teamsideline.com/sites/apex/home
Let me know if you have any issues with it.
That's awesome. Thank you. That works perfectly.