sdg002/AnyDotnetStuff

Site crawler encouters an error while parsingl links on premierleague.com site

Closed this issue · 0 comments

Which site?

WebsiteCrawler.exe --maxsites 10 --url https://www.premierleague.com/

What is the error?

2022-03-15 21:37:51,608 [7] INFO  WebsiteCrawler.Service.SingleThreadedWebSiteCrawler [(null)] - Downloaded page:https://www.premierleague.com/, Content is 176443 characters long
2022-03-15 21:37:51,664 [7] INFO  WebsiteCrawler.Service.SingleThreadedWebSiteCrawler [(null)] - Found 283 hyperlinks in the page https://www.premierleague.com/
2022-03-15 21:37:51,666 [7] INFO  WebsiteCrawler.Service.SingleThreadedWebSiteCrawler [(null)] - Found a link '#mainNav'
2022-03-15 21:37:51,667 [7] INFO  WebsiteCrawler.Service.SingleThreadedWebSiteCrawler [(null)] - Ignoring link #mainNav
2022-03-15 21:37:51,668 [7] INFO  WebsiteCrawler.Service.SingleThreadedWebSiteCrawler [(null)] - Found a link '#mainContent'
2022-03-15 21:37:51,668 [7] INFO  WebsiteCrawler.Service.SingleThreadedWebSiteCrawler [(null)] - Ignoring link #mainContent
2022-03-15 21:37:51,669 [7] INFO  WebsiteCrawler.Service.SingleThreadedWebSiteCrawler [(null)] - Found a link '        http://www.arsenal.com?utm_source=premier-league-website&utm_campaign=website&utm_medium=link
'
System.UriFormatException: Invalid URI: The Authority/Host could not be parsed.
   at System.Uri.CreateThis(String uri, Boolean dontEscape, UriKind uriKind)
   at System.Uri.CreateUri(Uri baseUri, String relativeUri, Boolean dontEscape)
   at System.Uri..ctor(Uri baseUri, String relativeUri)
   at WebsiteCrawler.Infrastructure.extensions.UrlExtensions.Combine(String parentUrl, String childUrl) in C:\Users\saurabhd\MyTrials\AnyDotnetStuff\WebsiteCrawler\WebsiteCrawler.Infrastructure\extensions\UrlExtensions.cs:line 37
   at WebsiteCrawler.Service.SingleThreadedWebSiteCrawler.<>c__DisplayClass10_0.<DiscoverLinks>b__0(String link) in C:\Users\saurabhd\MyTrials\AnyDotnetStuff\WebsiteCrawler\WebsiteCrawler.Service\SingleThreadedWebSiteCrawler.cs:line 112
   at System.Collections.Generic.List`1.ForEach(Action`1 action)
   at WebsiteCrawler.Service.SingleThreadedWebSiteCrawler.DiscoverLinks(String startingSite) in C:\Users\saurabhd\MyTrials\AnyDotnetStuff\WebsiteCrawler\WebsiteCrawler.Service\SingleThreadedWebSiteCrawler.cs:line 82
   at WebsiteCrawler.Service.SingleThreadedWebSiteCrawler.Run(String url, Int32 maxPagesToSearch) in C:\Users\saurabhd\MyTrials\AnyDotnetStuff\WebsiteCrawler\WebsiteCrawler.Service\SingleThreadedWebSiteCrawler.cs:line 54
   at WebsiteCrawler.Executable.Program.Run(CmdLineArgumentModel arg) in C:\Users\saurabhd\MyTrials\AnyDotnetStuff\WebsiteCrawler\WebsiteCrawler.Executable\Program.cs:line 57
   at CommandLine.ParserResultExtensions.WithParsedAsync[T](ParserResult`1 result, Func`2 action)
   at WebsiteCrawler.Executable.Program.Main(String[] args) in C:\Users\saurabhd\MyTrials\AnyDotnetStuff\WebsiteCrawler\WebsiteCrawler.Executable\Program.cs:line 46