/skipfish

An active web application security reconnaissance tool.

Primary LanguageCApache License 2.0Apache-2.0

===========================================
skipfish - web application security scanner
===========================================

  http://code.google.com/p/skipfish/

  * Written and maintained by:

      Michal Zalewski <lcamtuf@google.com>
      Niels Heinen <heinenn@google.com>
      Sebastian Roschke <s.roschke@googlemail.com>

  * Copyright 2009 - 2012 Google Inc, rights reserved.

  * Released under terms and conditions of the Apache License, version 2.0.

--------------------
1. What is skipfish?
--------------------

Skipfish is an active web application security reconnaissance tool. It
prepares an interactive sitemap for the targeted site by carrying out a
recursive crawl and dictionary-based probes. The resulting map is then
annotated with the output from a number of active (but hopefully
non-disruptive) security checks. The final report generated by the tool is
meant to serve as a foundation for professional web application security
assessments.

-------------------------------------------------
2. Why should I bother with this particular tool?
-------------------------------------------------

A number of commercial and open source tools with analogous functionality is
readily available (e.g., Nikto, Nessus); stick to the one that suits you
best. That said, skipfish tries to address some of the common problems
associated with web security scanners. Specific advantages include:

  * High performance: 500+ requests per second against responsive Internet
    targets, 2000+ requests per second on LAN / MAN networks, and 7000+ requests
    against local instances have been observed, with a very modest CPU, network,
    and memory footprint. This can be attributed to:

    * Multiplexing single-thread, fully asynchronous network I/O and data
      processing model that eliminates memory management, scheduling, and IPC
      inefficiencies present in some multi-threaded clients.

    * Advanced HTTP/1.1 features such as range requests, content compression,
      and keep-alive connections, as well as forced response size limiting, to
      keep network-level overhead in check.

    * Smart response caching and advanced server behavior heuristics are used to
      minimize unnecessary traffic.

    * Performance-oriented, pure C implementation, including a custom
      HTTP stack.

  * Ease of use: skipfish is highly adaptive and reliable. The scanner features:

    * Heuristic recognition of obscure path- and query-based parameter handling
      schemes.

    * Graceful handling of multi-framework sites where certain paths obey
      completely different semantics, or are subject to different filtering
      rules.

    * Automatic wordlist construction based on site content analysis.

    * Probabilistic scanning features to allow periodic, time-bound assessments
      of arbitrarily complex sites.

    * Well-designed security checks: the tool is meant to provide accurate
      and meaningful results:

      * Handcrafted dictionaries offer excellent coverage and permit thorough
        $keyword.$extension testing in a reasonable timeframe.

      * Three-step differential probes are preferred to signature checks for
         detecting vulnerabilities.

      * Ratproxy-style logic is used to spot subtle security problems:
        cross-site request forgery, cross-site script inclusion, mixed content,
        issues MIME- and charset mismatches, incorrect caching directives, etc.

      * Bundled security checks are designed to handle tricky scenarios:
        stored XSS (path, parameters, headers), blind SQL or XML injection,
        or blind shell injection.

      * Snort style content signatures which will highlight server errors,
        information leaks or potentially dangerous web applications.

      * Report post-processing drastically reduces the noise caused by any
        remaining false positives or server gimmicks by identifying repetitive
        patterns.

That said, skipfish is not a silver bullet, and may be unsuitable for certain
purposes. For example, it does not satisfy most of the requirements outlined
in WASC Web Application Security Scanner Evaluation Criteria (some of them on
purpose, some out of necessity); and unlike most other projects of this type,
it does not come with an extensive database of known vulnerabilities for
banner-type checks.

-----------------------------------------------------
3. Most curious! What specific tests are implemented?
-----------------------------------------------------

A rough list of the security checks offered by the tool is outlined below.

  * High risk flaws (potentially leading to system compromise):

    * Server-side query injection (including blind vectors, numerical parameters).
    * Explicit SQL-like syntax in GET or POST parameters.
    * Server-side shell command injection (including blind vectors).
    * Server-side XML / XPath injection (including blind vectors).
    * Format string vulnerabilities.
    * Integer overflow vulnerabilities.
    * Locations accepting HTTP PUT.

  * Medium risk flaws (potentially leading to data compromise):

    * Stored and reflected XSS vectors in document body (minimal JS XSS support).
    * Stored and reflected XSS vectors via HTTP redirects.
    * Stored and reflected XSS vectors via HTTP header splitting.
    * Directory traversal / LFI / RFI (including constrained vectors).
    * Assorted file POIs (server-side sources, configs, etc).
    * Attacker-supplied script and CSS inclusion vectors (stored and reflected).
    * External untrusted script and CSS inclusion vectors.
    * Mixed content problems on script and CSS resources (optional).
    * Password forms submitting from or to non-SSL pages (optional).
    * Incorrect or missing MIME types on renderables.
    * Generic MIME types on renderables.
    * Incorrect or missing charsets on renderables.
    * Conflicting MIME / charset info on renderables.
    * Bad caching directives on cookie setting responses.

  * Low risk issues (limited impact or low specificity):

    * Directory listing bypass vectors.
    * Redirection to attacker-supplied URLs (stored and reflected).
    * Attacker-supplied embedded content (stored and reflected).
    * External untrusted embedded content.
    * Mixed content on non-scriptable subresources (optional).
    * HTTPS -> HTTP submission of HTML forms (optional).
    * HTTP credentials in URLs.
    * Expired or not-yet-valid SSL certificates.
    * HTML forms with no XSRF protection.
    * Self-signed SSL certificates.
    * SSL certificate host name mismatches.
    * Bad caching directives on less sensitive content.

  * Internal warnings:

    * Failed resource fetch attempts.
    * Exceeded crawl limits.
    * Failed 404 behavior checks.
    * IPS filtering detected.
    * Unexpected response variations.
    * Seemingly misclassified crawl nodes.

  * Non-specific informational entries:

    * General SSL certificate information.
    * Significantly changing HTTP cookies.
    * Changing Server, Via, or X-... headers.
    * New 404 signatures.
    * Resources that cannot be accessed.
    * Resources requiring HTTP authentication.
    * Broken links.
    * Server errors.
    * All external links not classified otherwise (optional).
    * All external e-mails (optional).
    * All external URL redirectors (optional).
    * Links to unknown protocols.
    * Form fields that could not be autocompleted.
    * Password entry forms (for external brute-force).
    * File upload forms.
    * Other HTML forms (not classified otherwise).
    * Numerical file names (for external brute-force).
    * User-supplied links otherwise rendered on a page.
    * Incorrect or missing MIME type on less significant content.
    * Generic MIME type on less significant content.
    * Incorrect or missing charset on less significant content.
    * Conflicting MIME / charset information on less significant content.
    * OGNL-like parameter passing conventions.

Along with a list of identified issues, skipfish also provides summary
overviews of document types and issue types found; and an interactive
sitemap, with nodes discovered through brute-force denoted in a distinctive
way.

NOTE: As a conscious design decision, skipfish will not redundantly complain
about highly non-specific issues, including but not limited to:

  * Non-httponly or non-secure cookies,
  * Non-HTTPS or autocomplete-enabled forms,
  * HTML comments detected on a page,
  * Filesystem path disclosure in error messages,
  * Server of framework version disclosure,
  * Servers supporting TRACE or OPTIONS requests,
  * Mere presence of certain technologies, such as WebDAV.

Most of these aspects are easy to inspect in a report if so desired - for
example, all the HTML forms are listed separately, so are new cookies or
interesting HTTP headers - and the expectation is that the auditor may opt to
make certain design recommendations based on this data where appropriate.
That said, these occurrences are not highlighted as a specific security flaw.

-----------------------------------------------------------
4. All right, I want to try it out. What do I need to know?
-----------------------------------------------------------

First and foremost, please do not be evil. Use skipfish only against services
you own, or have a permission to test.

Keep in mind that all types of security testing can be disruptive. Although
the scanner is designed not to carry out malicious attacks, it may
accidentally interfere with the operations of the site. You must accept the
risk, and plan accordingly. Run the scanner against test instances where
feasible, and be prepared to deal with the consequences if things go wrong.

Also note that the tool is meant to be used by security professionals, and is
experimental in nature. It may return false positives or miss obvious
security problems - and even when it operates perfectly, it is simply not
meant to be a point-and-click application. Do not take its output at face
value.

Running the tool against vendor-supplied demo sites is not a good way to
evaluate it, as they usually approximate vulnerabilities very imperfectly; we
made no effort to accommodate these cases.

Lastly, the scanner is simply not designed for dealing with rogue and
misbehaving HTTP servers - and offers no guarantees of safe (or sane)
behavior there.

--------------------------
5. How to run the scanner?
--------------------------

To compile it, simply unpack the archive and try make. Chances are, you will
need to install libidn first.

Next, you need to read the instructions provided in doc/dictionaries.txt
to select the right dictionary file and configure it correctly. This step has a 
profound impact on the quality of scan results later on, so don't skip it.

Once you have the dictionary selected, you can use -S to load that dictionary,
and -W to specify an initially empty file for any newly learned site-specific
keywords (which will come handy in future assessments):

$ touch new_dict.wl
$ ./skipfish -o output_dir -S existing_dictionary.wl -W new_dict.wl \
  http://www.example.com/some/starting/path.txt

You can use -W- if you don't want to store auto-learned keywords anywhere.

Note that you can provide more than one starting URL if so desired; all of
them will be crawled. It is also possible to read URLs from file, using
the following syntax:

$ ./skipfish [...other options...] @../path/to/url_list.txt

The tool will display some helpful stats while the scan is in progress. You
can also switch to a list of in-flight HTTP requests by pressing return.

In the example above, skipfish will scan the entire www.example.com
(including services on other ports, if linked to from the main page), and
write a report to output_dir/index.html. You can then view this report with
your favorite browser (JavaScript must be enabled; and because of recent
file:/// security improvements in certain browsers, you might need to access
results over HTTP). The index.html file is static; actual results are stored
as a hierarchy of JSON files, suitable for machine processing or different
presentation frontends if needs be. In addition, a list of all the discovered
URLs will be saved to a single file, pivots.txt, for easy postprocessing.

A simple companion script, sfscandiff, can be used to compute a delta for
two scans executed against the same target with the same flags. The newer
report will be non-destructively annotated by adding red background to all
new or changed nodes; and blue background to all new or changed issues
found.

Some sites may require authentication for which our support is described
in doc/authentication.txt. In most cases, you'll be wanting to use the
form authentication method which is capable of detecting broken sessions
in order to re-authenticate.

Once authenticated, certain URLs on the site may log out your session;
you can combat this in two ways: by using the -N option, which causes
the scanner to reject attempts to set or delete cookies; or with the -X
parameter, which prevents matching URLs from being fetched:

$ ./skipfish -X /logout/logout.aspx ...other parameters...

The -X option is also useful for speeding up your scans by excluding /icons/,
/doc/, /manuals/, and other standard, mundane locations along these lines. In
general, you can use -X and -I (only spider URLs matching a substring) to
limit the scope of a scan any way you like - including restricting it only to
a specific protocol and port:

$ ./skipfish -I http://example.com:1234/ ...other parameters...

A related function, -K, allows you to specify parameter names not to fuzz
(useful for applications that put session IDs in the URL, to minimize noise).

Another useful scoping option is -D - allowing you to specify additional
hosts or domains to consider in-scope for the test. By default, all hosts
appearing in the command-line URLs are added to the list - but you can use -D
to broaden these rules, for example:

$ ./skipfish -D test2.example.com -o output-dir http://test1.example.com/

...or, for a domain wildcard match, use:

$ ./skipfish -D .example.com -o output-dir http://test1.example.com/

In some cases, you do not want to actually crawl a third-party domain, but
you trust the owner of that domain enough not to worry about cross-domain
content inclusion from that location. To suppress warnings, you can use the
-B option, for example:

$ ./skipfish -B .google-analytics.com -B .googleapis.com ...other
parameters...

By default, skipfish sends minimalistic HTTP headers to reduce the amount of
data exchanged over the wire; some sites examine User-Agent strings or header
ordering to reject unsupported clients, however. In such a case, you can use
-b ie, -b ffox, or -b phone to mimic one of the two popular browsers (or
iPhone).

When it comes to customizing your HTTP requests, you can also use the -H
option to insert any additional, non-standard headers; or -F to define a
custom mapping between a host and an IP (bypassing the resolver). The latter
feature is particularly useful for not-yet-launched or legacy services.

Some sites may be too big to scan in a reasonable timeframe. If the site
features well-defined tarpits - for example, 100,000 nearly identical user
profiles as a part of a social network - these specific locations can be
excluded with -X or -S. In other cases, you may need to resort to other
settings: -d limits crawl depth to a specified number of subdirectories; -c
limits the number of children per directory; -x limits the total number of
descendants per crawl tree branch; and -r limits the total number of requests
to send in a scan.

An interesting option is available for repeated assessments: -p. By
specifying a percentage between 1 and 100%, it is possible to tell the
crawler to follow fewer than 100% of all links, and try fewer than 100% of
all dictionary entries. This - naturally - limits the completeness of a scan,
but unlike most other settings, it does so in a balanced, non-deterministic
manner. It is extremely useful when you are setting up time-bound, but
periodic assessments of your infrastructure. Another related option is -q,
which sets the initial random seed for the crawler to a specified value. This
can be used to exactly reproduce a previous scan to compare results.
Randomness is relied upon most heavily in the -p mode, but also for making a
couple of other scan management decisions elsewhere.

Some particularly complex (or broken) services may involve a very high number
of identical or nearly identical pages. Although these occurrences are by
default grayed out in the report, they still use up some screen estate and
take a while to process on JavaScript level. In such extreme cases, you may
use the -Q option to suppress reporting of duplicate nodes altogether, before
the report is written. This may give you a less comprehensive understanding
of how the site is organized, but has no impact on test coverage.

In certain quick assessments, you might also have no interest in paying any
particular attention to the desired functionality of the site - hoping to
explore non-linked secrets only. In such a case, you may specify -P to
inhibit all HTML parsing. This limits the coverage and takes away the ability
for the scanner to learn new keywords by looking at the HTML, but speeds up
the test dramatically. Another similarly crippling option that reduces the
risk of persistent effects of a scan is -O, which inhibits all form parsing
and submission steps.

Some sites that handle sensitive user data care about SSL - and about getting
it right. Skipfish may optionally assist you in figuring out problematic
mixed content or password submission scenarios - use the -M option to enable
this. The scanner will complain about situations such as http:// scripts
being loaded on https:// pages - but will disregard non-risk scenarios such
as images.

Likewise, certain pedantic sites may care about cases where caching is
restricted on HTTP/1.1 level, but no explicit HTTP/1.0 caching directive is
given on specifying -E in the command-line causes skipfish to log all such
cases carefully.

In some occasions, you want to limit the requests per second to limit
the load on the targets server (or possibly bypass DoS protection). The
-l flag can be used to set this limit and the value given is the maximum
amount of requests per second you want skipfish to perform.

Scans typically should not take weeks. In many cases, you probably
want to limit the scan duration so that it fits within a certain time
window. This can be done with the -k flag, which allows the amount of
hours, minutes and seconds to be specified in a H:M:S format. Use of
this flag can affect the scan coverage if the scan timeout occurs before
testing all pages.

Lastly, in some assessments that involve self-contained sites without
extensive user content, the auditor may care about any external e-mails or
HTTP links seen, even if they have no immediate security impact. Use the -U
option to have these logged.

Dictionary management is a special topic, and - as mentioned - is covered in
more detail in doc/dictionaries.txt. Please read that file before
proceeding. Some of the relevant options include -S and -W (covered earlier),
-L to suppress auto-learning, -G to limit the keyword guess jar size, -R to
drop old dictionary entries, and -Y to inhibit expensive $keyword.$extension
fuzzing.

Skipfish also features a form auto-completion mechanism in order to maximize
scan coverage. The values should be non-malicious, as they are not meant to
implement security checks - but rather, to get past input validation logic.
You can define additional rules, or override existing ones, with the -T
option (-T form_field_name=field_value, e.g. -T login=test123 -T
password=test321 - although note that -C and -A are a much better method of
logging in).

There is also a handful of performance-related options. Use -g to set the
maximum number of connections to maintain, globally, to all targets (it is
sensible to keep this under 50 or so to avoid overwhelming the TCP/IP stack
on your system or on the nearby NAT / firewall devices); and -m to set the
per-IP limit (experiment a bit: 2-4 is usually good for localhost, 4-8 for
local networks, 10-20 for external targets, 30+ for really lagged or
non-keep-alive hosts). You can also use -w to set the I/O timeout (i.e.,
skipfish will wait only so long for an individual read or write), and -t to
set the total request timeout, to account for really slow or really fast
sites.

Lastly, -f controls the maximum number of consecutive HTTP errors you are
willing to see before aborting the scan; and -s sets the maximum length of a
response to fetch and parse (longer responses will be truncated).

When scanning large, multimedia-heavy sites, you may also want to specify -e.
This prevents binary documents from being kept in memory for reporting
purposes, and frees up a lot of RAM.

Further rate-limiting is available through third-party user mode tools such
as trickle, or kernel-level traffic shaping.

Oh, and real-time scan statistics can be suppressed with -u.

--------------------------------
6. But seriously, how to run it?
--------------------------------

A standard, authenticated scan of a well-designed and self-contained site
(warns about all external links, e-mails, mixed content, and caching header
issues), including gentle brute-force:

$ touch new_dict.wl
$ ./skipfish -MEU -S dictionaries/minimal.wl -W new_dict.wl \
  -C "AuthCookie=value" -X /logout.aspx -o output_dir \
  http://www.example.com/

Five-connection crawl, but no brute-force; pretending to be MSIE and
trusting example.com content:

$ ./skipfish -m 5 -L -W- -o output_dir -b ie -B example.com \
  http://www.example.com/

Heavy brute force only (no HTML link extraction), limited to a single
directory and timing out after 5 seconds:

$ touch new_dict.wl
$ ./skipfish -S dictionaries/complete.wl -W new_dict.wl \
   -P -I http://www.example.com/dir1/ -o output_dir -t 5 -I \
  http://www.example.com/dir1/

For a short list of all command-line options, try ./skipfish -h.

----------------------------------------------------
7. How to interpret and address the issues reported?
----------------------------------------------------

Most of the problems reported by skipfish should self-explanatory, assuming you
have a good gasp of the fundamentals of web security. If you need a quick
refresher on some of the more complicated topics, such as MIME sniffing, you
may enjoy our comprehensive Browser Security Handbook as a starting point:

  http://code.google.com/p/browsersec/

If you still need assistance, there are several organizations that put a
considerable effort into documenting and explaining many of the common web
security threats, and advising the public on how to address them. I encourage
you to refer to the materials published by OWASP and Web Application Security
Consortium, amongst others:

  * http://www.owasp.org/index.php/Category:Principle
  * http://www.owasp.org/index.php/Category:OWASP_Guide_Project
  * http://www.webappsec.org/projects/articles/

Although I am happy to diagnose problems with the scanner itself, I regrettably
cannot offer any assistance with the inner wokings of third-party web
applications.

---------------------------------------
8. Known limitations / feature wishlist
---------------------------------------

Below is a list of features currently missing in skipfish. If you wish to
improve the tool by contributing code in one of these areas, please let me
know:

  * Buffer overflow checks: after careful consideration, I suspect there is
    no reliable way to test for buffer overflows remotely. Much like the actual
    fault condition we are looking for, proper buffer size checks may also
    result in uncaught exceptions, 500 messages, etc. I would love to be proved
    wrong, though.

  * Fully-fledged JavaScript XSS detection: several rudimentary checks are
    present in the code, but there is no proper script engine to evaluate
    expressions and DOM access built in.

  * Variable length encoding character consumption / injection bugs: these
    problems seem to be largely addressed on browser level at this point, so
    they were much lower priority at the time of this writing.

  * Security checks and link extraction for third-party, plugin-based
    content (Flash, Java, PDF, etc).

  * Password brute-force and numerical filename brute-force probes.

  * Search engine integration (vhosts, starting paths).

  * VIEWSTATE decoding.

  * NTLM and digest authentication.

  * More specific PHP tests (eval injection, RFI).

  * Proxy support: an experimental HTTP proxy support is available through
    a #define directive in config.h. Adding support for HTTPS proxying is
    more complicated, and still in the works.

  * Scan resume option, better runtime info.

  * Standalone installation (make install) support.

  * Scheduling and management web UI.

-------------------------------------
9. Oy! Something went horribly wrong!
-------------------------------------

There is no web crawler so good that there wouldn't be a web framework to one
day set it on fire. If you encounter what appears to be bad behavior (e.g., a
scan that takes forever and generates too many requests, completely bogus
nodes in scan output, or outright crashes), please first check our known
issues page:

  http://code.google.com/p/skipfish/wiki/KnownIssues

If you can't find a satisfactory answer there, recompile the scanner with:

$ make clean debug

...and re-run it this way:

$ ./skipfish [...previous options...] 2>logfile.txt

You can then inspect logfile.txt to get an idea what went wrong; if it looks
like a scanner problem, please scrub any sensitive information from the log
file and send it to the author.

If the scanner crashed, please recompile it as indicated above, and then type:

$ ulimit -c unlimited
$ ./skipfish [...previous options...] 2>logfile.txt
$ gdb --batch -ex back ./skipfish core

...and be sure to send the author the output of that last command as well.

------------------------
10. Credits and feedback
------------------------

Skipfish is made possible thanks to the contributions of, and valuable
feedback from, Google's information security engineering team.

If you have any bug reports, questions, suggestions, or concerns regarding
the application, the primary author can be reached at lcamtuf@google.com.