Crawl always starts from server root
Closed this issue · 3 comments
GoogleCodeExporter commented
When calling skipfish as:
./skipfish -o ../out https://test.com/foo/bar/baz.html
The crawler always starts from https://test.com/, ignoring path and parameters
(and from looking at the code in database.c, it seems it does this every time a
link points to a new host).
I'd like to submit a patch to change this behavior (through a command-line
switch), but before I do that I'd like to know the rationale for current code,
in order not to break any useful use case.
Best regards,
Mattia
Original issue reported on code.google.com by mattiaba...@gmail.com
on 1 Jul 2013 at 8:45
GoogleCodeExporter commented
There is a separate command-line parameter to limit the scan to a specific path
(or exclude specific paths). Without it, the scanner simply takes any number of
"seed" URLs in the command line, but it brute-forces the entire site. All of
them should still get crawled, just not right away.
Original comment by lcam...@google.com
on 1 Jul 2013 at 8:56
GoogleCodeExporter commented
To expand on what Michal said: Using -I /foo/bar/ for explicit inclusion allows
the active testing to be limited to /foo/bar/*
Are you concerned about / or /foo/ being actively tested ? This should not
happen with -I. Or is there a different problem ?
Original comment by niels.he...@gmail.com
on 2 Jul 2013 at 8:18
GoogleCodeExporter commented
Original comment by niels.he...@gmail.com
on 17 Nov 2013 at 8:16
- Changed state: Invalid