A library for safely handling various kinds of user input. The idea is to provide a simple, convenient API that builds upon existing, proven libraries such as JSON Sanitizer, HTML Sanitizer, and PegDown
:dependencies [[autoclave "0.1.7"]]
(require '[autoclave.core :refer :all])
The json-sanitize
function takes a string containing JSON-like content and
produces well-formed JSON. It corrects minor mistakes in encoding and makes it
easier to embed in HTML and XML documents.
(json-sanitize "{some: 'derpy json' n: +123}")
; "{\"some\": \"derpy json\" ,\"\":\"n\" ,\"\":123}"
More information, quoted from here:
The output is well-formed JSON as defined by RFC 4627. The output satisfies (four) additional properties:
- The output will not contain the substring (case-insensitively)
</script
so can be embedded inside an HTML script element without further encoding.- The output will not contain the substring
]]>
so can be embedded inside an XML CDATA section without further encoding.- The output is a valid Javascript expression, so can be parsed by Javascript's
eval
builtin (after being wrapped in parentheses) or byJSON.parse
. Specifically, the output will not contain any string literals with embedded JS newlines (U+2028 Paragraph separator or U+2029 Line separator).- The output contains only valid Unicode scalar values (no isolated UTF-16 surrogates) that are allowed in XML unescaped.
Since JSON Sanitizer isn't available from Maven Central or Clojars or any other repositories that I know of, its source is included locally, unmodified, in src/java/com/google/json.
By default, the html-sanitize
function strips all HTML from a string.
(html-sanitize "Hello, <script>alert(\"0wn3d\");</script>world!")
; "Hello, world!"
You can create policies using html-policy
to whitelist certain HTML elements
and attributes with fine-grained control.
(def policy (html-policy :allow-elements ["a"]
:allow-attributes ["href" :on-elements ["a"]]
:allow-standard-url-protocols
:require-rel-nofollow-on-links))
(html-sanitize policy "<a href=\"http://github.com/\">GitHub</a>")
; "<a href=\"http://github.com\" rel=\"nofollow\">GitHub</a>"
Here are the available options (adapted from here):
:allow-attributes [& attr-names attr-options]
Allow specific attributes. The following options are available::globally
Allow the specified attributes to appear on all elements.:matching [pattern]
Allow only values that match the provided regular expression (java.util.regex.Pattern).:matching [f]
Allow the named attributes for which(f element-name attr-name value)
returns a non-nil, possibly adjustedvalue
.:on-elements [& element-names]
Allow the named attributes only on the named elements.
:allow-common-block-elements
Allowsp
,div
,h[1-6]
,ul
,ol
,li
, andblockquote
.:allow-common-inline-formatting-elements
Allowsb
,i
,font
,s
,u
,o
,sup
,sub
,ins
,del
,strong
,strike
,tt
,code
,big
,small
,br
, andspan
elements.:allow-elements [f & element-names]
Allow the named elements for which(f element-name ^java.util.List attrs)
returns a non-nil, possibly adjustedelement-name
. Here is an example.:allow-elements [& element-names]
Allow the named elements.:allow-standard-url-protocols
Allowshttp
,https
, andmailto
to appear in URL attributes.:allow-styling
Convertstyle
attributes to simplefont
tags to allow color, size, typeface, and other styling.:allow-text-in [& element-names]
Allow text in the named elements.:allow-url-protocols [& url-protocols]
Allow the given URL protocols.:allow-without-attributes [& element-names]
Allow the named elements to appear without any attributes.:disallow-attributes [& attr-names attr-options]
Disallow the named attributes. See:allow-attributes
for available options.:disallow-elements [& element-names]
Disallow the named elements.:disallow-text-in [& element-names]
Disallow text to appear in the named elements.:disallow-url-protocols [& url-protocols]
Disallow the given URL protocols.:disallow-without-attributes [& element-names]
Disallow the named elements to appear without any attributes.:require-rel-nofollow-on-links
Requirerel="nofollow"
in links (adding it if not present).
Several policies come predefined for convenience. You can access them using the
html-policy
or html-merge-policies
functions (see below).
(def policy (html-policy :BLOCKS))
:BLOCKS
Allows common block elements, as in:allow-common-block-elements
.:FORMATTING
Allows common inline formatting elements as in:allow-common-inline-formatting-elements
.:IMAGES
Allowsimg
tags withalt
,src
,border
,height
, andwidth
attributes, with appropriate restrictions.:LINKS
Allowsa
tags with standard URL protocols andrel="nofollow"
.:STYLES
Allows simple styling as in:allow-styling
.
You can merge policies using html-merge-policies
. Provide it with a sequence
of option sequences or PolicyFactory
objects (such as those returned by
html-policy
).
(def policy (html-merge-policies :BLOCKS :FORMATTING :LINKS))
Yes, there's already a PegDown wrapper for Clojure (called cegdown). But this one's got a few more features and I'm including it for the sake of completeness.
By default the markdown-to-html
function simply adheres to the original
Markdown specification.
(markdown-to-html "# Hello, \"<em>world</em>\"")
; "<h1>Hello, \"<em>world</em>\"</h1>"
The markdown-processor
function returns a processor factory with the
specified behavior. Suppose, for example, you wanted to suppress all
user-supplied HTML:
(def processor (markdown-processor :quotes
:suppress-all-html))
(markdown-to-html processor "# Hello, \"<em>world</em>\"")
; "<h1>Hello, “world”</h1>"
It's also thread-safe.
Here are the available options (adapted from here):
:abbreviations
Enable abbreviations.:all
Enable all extensions, excluding the:suppress-*
ones.:autolinks
Enable automatic linking of URLs.:definitions
Enable definition lists.:fenced-code-blocks
Enable fenced code blocks via different syntaxes, one and two.:hardwraps
Enable interpretation of single newlines as hardwraps.:none
Don't enable any extensions (default).:quotes
Turn single and double quotes and angle quotes into fancy entities.:smarts
Turn ellipses, dashes, and apostrophes into fancy entities.:smartypants
Enable:quotes
and:smarts
.:strikethrough
Enablestrikethrough.:suppress-all-html
Enable both:suppress-html-blocks
and:suppress-inline-html
.:suppress-html-blocks
Suppress user-supplied block HTML tags.:suppress-inline-html
Suppress user-supplied inline HTML tags.:tables
Enable tables.:wikilinks
Enable[[wiki-style links]]
(see below for more information).
You can customize how automatic, explicit (or inline), mail, reference, and
wiki links are rendered by supplying your own LinkRenderer. The
markdown-link-renderer
function provides a nicer way to proxy it.
(def link-renderer (markdown-link-renderer
{:auto (fn [node]
{:text (->> (.getText node)
(re-find #"://(\w+).")
second
capitalize)
:href (.getText node)
:attributes ["class" "autolink"]})})
(def processor (markdown-processor :autolinks))
(markdown-to-html processor link-renderer "http://google.com")
; "<a href=\"http://google.com\" class=\"autolink\">Google</a>"
The available overrides are (adapted from here):
:auto [^AutoLinkNode node]
:explicit [^ExpLinkNode node ^String text]
:explicit-image [^ExpImageNode node ^String text]
:mail [^MailLinkNode node]
:reference [^RefLinkNode node ^String url ^String title ^String text]
:reference-image [^RefImageNode node ^String url ^String title ^String text]
:wiki [^WikiLinkNode node]
They should return a map containing the link's :text
, :href
, and any other
:attributes
(as a flat sequence of strings) as in the example above.
Copyright © 2013 Alex Little
Distributed under the Eclipse Public License, the same as Clojure.