XeniaRieger/Modern-Search-Engines

Crawler Bugs

Opened this issue · 0 comments

  • [] Language detection has errors: Webpages that are mainly in german are somtimes detected as english and vice versa. Problem that tokens consider ALL content on the website (also header,...) and not just the information of the website

  • [] Websites marked as "not allowed to crawl" that are, e.g. https://www.bandsintown.com/robots.txt allows us to crawl it, but our crawler won't crawl it

  • [] SSL certificat issue -> see Question for Tutorium 2