/autoclave

A Clojure library for safely handling various kinds of user input.

Primary LanguageJava

autoclave

A library for safely handling various kinds of user input. The idea is to provide a simple, convenient API that builds upon existing, proven libraries such as JSON Sanitizer, HTML Sanitizer, and PegDown

Installation

:dependencies [[autoclave "0.1.7"]]

Usage

(require '[autoclave.core :refer :all])

JSON

The json-sanitize function takes a string containing JSON-like content and produces well-formed JSON. It corrects minor mistakes in encoding and makes it easier to embed in HTML and XML documents.

(json-sanitize "{some: 'derpy json' n: +123}")
; "{\"some\": \"derpy json\" ,\"\":\"n\" ,\"\":123}"

More information, quoted from here:

The output is well-formed JSON as defined by RFC 4627. The output satisfies (four) additional properties:

  1. The output will not contain the substring (case-insensitively) </script so can be embedded inside an HTML script element without further encoding.
  2. The output will not contain the substring ]]> so can be embedded inside an XML CDATA section without further encoding.
  3. The output is a valid Javascript expression, so can be parsed by Javascript's eval builtin (after being wrapped in parentheses) or by JSON.parse. Specifically, the output will not contain any string literals with embedded JS newlines (U+2028 Paragraph separator or U+2029 Line separator).
  4. The output contains only valid Unicode scalar values (no isolated UTF-16 surrogates) that are allowed in XML unescaped.

Since JSON Sanitizer isn't available from Maven Central or Clojars or any other repositories that I know of, its source is included locally, unmodified, in src/java/com/google/json.

HTML

By default, the html-sanitize function strips all HTML from a string.

(html-sanitize "Hello, <script>alert(\"0wn3d\");</script>world!")
; "Hello, world!"

Policies

You can create policies using html-policy to whitelist certain HTML elements and attributes with fine-grained control.

(def policy (html-policy :allow-elements ["a"]
                         :allow-attributes ["href" :on-elements ["a"]]
                         :allow-standard-url-protocols
                         :require-rel-nofollow-on-links))

(html-sanitize policy "<a href=\"http://github.com/\">GitHub</a>")
; "<a href=\"http://github.com\" rel=\"nofollow\">GitHub</a>"

Here are the available options (adapted from here):

  • :allow-attributes [& attr-names attr-options]
    Allow specific attributes. The following options are available:
    • :globally
      Allow the specified attributes to appear on all elements.
    • :matching [pattern]
      Allow only values that match the provided regular expression (java.util.regex.Pattern).
    • :matching [f]
      Allow the named attributes for which (f element-name attr-name value) returns a non-nil, possibly adjusted value.
    • :on-elements [& element-names]
      Allow the named attributes only on the named elements.
  • :allow-common-block-elements
    Allows p, div, h[1-6], ul, ol, li, and blockquote.
  • :allow-common-inline-formatting-elements
    Allows b, i, font, s, u, o, sup, sub, ins, del, strong, strike, tt, code, big, small, br, and span elements.
  • :allow-elements [f & element-names]
    Allow the named elements for which (f element-name ^java.util.List attrs) returns a non-nil, possibly adjusted element-name. Here is an example.
  • :allow-elements [& element-names]
    Allow the named elements.
  • :allow-standard-url-protocols
    Allows http, https, and mailto to appear in URL attributes.
  • :allow-styling
    Convert style attributes to simple font tags to allow color, size, typeface, and other styling.
  • :allow-text-in [& element-names]
    Allow text in the named elements.
  • :allow-url-protocols [& url-protocols]
    Allow the given URL protocols.
  • :allow-without-attributes [& element-names]
    Allow the named elements to appear without any attributes.
  • :disallow-attributes [& attr-names attr-options]
    Disallow the named attributes. See :allow-attributes for available options.
  • :disallow-elements [& element-names]
    Disallow the named elements.
  • :disallow-text-in [& element-names]
    Disallow text to appear in the named elements.
  • :disallow-url-protocols [& url-protocols]
    Disallow the given URL protocols.
  • :disallow-without-attributes [& element-names]
    Disallow the named elements to appear without any attributes.
  • :require-rel-nofollow-on-links
    Require rel="nofollow" in links (adding it if not present).

Predefined policies

Several policies come predefined for convenience. You can access them using the html-policy or html-merge-policies functions (see below).

(def policy (html-policy :BLOCKS))
  • :BLOCKS
    Allows common block elements, as in :allow-common-block-elements.
  • :FORMATTING
    Allows common inline formatting elements as in :allow-common-inline-formatting-elements.
  • :IMAGES
    Allows img tags with alt, src, border, height, and width attributes, with appropriate restrictions.
  • :LINKS
    Allows a tags with standard URL protocols and rel="nofollow".
  • :STYLES
    Allows simple styling as in :allow-styling.

Merging policies

You can merge policies using html-merge-policies. Provide it with a sequence of option sequences or PolicyFactory objects (such as those returned by html-policy).

(def policy (html-merge-policies :BLOCKS :FORMATTING :LINKS))

Markdown

Yes, there's already a PegDown wrapper for Clojure (called cegdown). But this one's got a few more features and I'm including it for the sake of completeness.

By default the markdown-to-html function simply adheres to the original Markdown specification.

(markdown-to-html "# Hello, \"<em>world</em>\"")
; "<h1>Hello, \"<em>world</em>\"</h1>"

Processors

The markdown-processor function returns a processor factory with the specified behavior. Suppose, for example, you wanted to suppress all user-supplied HTML:

(def processor (markdown-processor :quotes
                                   :suppress-all-html))

(markdown-to-html processor "# Hello, \"<em>world</em>\"")
; "<h1>Hello, &ldquo;world&rdquo;</h1>"

It's also thread-safe.

Here are the available options (adapted from here):

  • :abbreviations
    Enable abbreviations.
  • :all
    Enable all extensions, excluding the :suppress-* ones.
  • :autolinks
    Enable automatic linking of URLs.
  • :definitions
    Enable definition lists.
  • :fenced-code-blocks
    Enable fenced code blocks via different syntaxes, one and two.
  • :hardwraps
    Enable interpretation of single newlines as hardwraps.
  • :none
    Don't enable any extensions (default).
  • :quotes
    Turn single and double quotes and angle quotes into fancy entities.
  • :smarts
    Turn ellipses, dashes, and apostrophes into fancy entities.
  • :smartypants
    Enable :quotes and :smarts.
  • :strikethrough
    Enable strikethrough.
  • :suppress-all-html
    Enable both :suppress-html-blocks and :suppress-inline-html.
  • :suppress-html-blocks
    Suppress user-supplied block HTML tags.
  • :suppress-inline-html
    Suppress user-supplied inline HTML tags.
  • :tables
    Enable tables.
  • :wikilinks
    Enable [[wiki-style links]] (see below for more information).

Link renderers

You can customize how automatic, explicit (or inline), mail, reference, and wiki links are rendered by supplying your own LinkRenderer. The markdown-link-renderer function provides a nicer way to proxy it.

(def link-renderer (markdown-link-renderer
                     {:auto (fn [node]
                              {:text (->> (.getText node)
                                          (re-find #"://(\w+).")
                                          second
                                          capitalize)
                               :href (.getText node)
                               :attributes ["class" "autolink"]})})

(def processor (markdown-processor :autolinks))

(markdown-to-html processor link-renderer "http://google.com")
; "<a href=\"http://google.com\" class=\"autolink\">Google</a>"

The available overrides are (adapted from here):

  • :auto [^AutoLinkNode node]
  • :explicit [^ExpLinkNode node ^String text]
  • :explicit-image [^ExpImageNode node ^String text]
  • :mail [^MailLinkNode node]
  • :reference [^RefLinkNode node ^String url ^String title ^String text]
  • :reference-image [^RefImageNode node ^String url ^String title ^String text]
  • :wiki [^WikiLinkNode node]

They should return a map containing the link's :text, :href, and any other :attributes (as a flat sequence of strings) as in the example above.

Other

License

Copyright © 2013 Alex Little

Distributed under the Eclipse Public License, the same as Clojure.