Start crawl sends wrong seed to the crawler
aecio opened this issue · 2 comments
When DDT sends the URL to DDT it is appending a string ,1
to the end of the seed URL. Maybe that string is the count of URLs shown in the recommendations box.
This does not seem to be the case. The following ACHE crawler message when urls are added reiterates this:
[2017-08-03 15:50:34,238] INFO [qtp597874846-15] (FrontierManager.java:236) - Adding 3 seed URL(s)...
[2017-08-03 15:50:34,320] INFO [qtp597874846-15] (FrontierManager.java:248) - Added seed URL: http://answers.yahoo.com/dir/index/discover?sid=396545327
[2017-08-03 15:50:34,320] INFO [qtp597874846-15] (FrontierManager.java:248) - Added seed URL: http://answers.yahoo.com/dir/index/discover?sid=396545433
[2017-08-03 15:50:34,321] INFO [qtp597874846-15] (FrontierManager.java:248) - Added seed URL: http://answers.yahoo.com/
This issue is still happening, tough it not always appending ,1
. Right now I'm seeing that it appended 1
in the URLs shown in "Crawling View" -> "Deep Crawling" -> "Domains for crawling".