kgretzky/dcrawl

resuming old thread

d4op opened this issue · 7 comments

d4op commented

theres a bug its adding the old urls found before and goes on without problems.
when i check after stopping dcrawl the file, it didnt add anything new.

Can you see in console output that it is adding new domains?

d4op commented
d4op commented

there is no output, it says added xxxx.com
etc
but the new ones arent written to the old file.

@kgretzky Thank you for building this, saves me bunch of time.

I can confirm this happening. There is no additional details in the console but this is how you can replicate this (have tried both on OS X 10.11.6 and Debian Stretch):

  • Started crawling, output to file
  • Output saved as supposed to
  • Stop crawling
  • Start crawling again. Shows progress as supposed to.
  • Nothing added to the file

Issue fixed , you can use https://github.com/ubogdan/dcrawl until the owner will have some time to merge it into master branch.

the issue will be fixed by changing the file open line to this

fo, err := os.OpenFile(*output_file, os.O_RDWR|os.O_APPEND|os.O_CREATE, 0664)