Crawler indexes all pages on a particular domain rather pages under a path
Amolith opened this issue · 1 comments
Amolith commented
When running Lieu over all the sites in the fediring, we've found that it's only bound by domain rather than domain+path. This causes quirks with static site hosts like cronut.cafe; the only cronut.cafe user who's also a member of the ring is ~sfr, but multiple other users who aren't members have been indexed as well: https://search.fediring.net/?q=cronut
I think a good solution might be keeping track of not only the domain that's being crawled but also the original URL and ignoring links to parent directories.