This module implements an isomorphic sanitized HTML data type for Ampersand.js. On the server, Google's Gumbo HTML parser is used to parse and sanitize the HTML data. In the browser, the sanitized value is used when rendering user-generated content.
In a typical use case, a user submits HTML to the server. The server then parses and sanitizes the HTML to return to the client. The client implementation then renders the sanitized HTML. To avoid XSS attacks, even from the same user that generated the HTML content, care must be taken to only trust server-sanitized HTML.
The security of the data type is dependent on the underlying libraries used to parse and sanitize the HTML:
- Gumbo for parsing HTML
- Gumbo Parser for using Gumbo in node & iojs
- Gumbo Sanitize for generating sanitized HTML
Security reviews and code contributions are welcome.
npm install --save ampersand-sanitized-html-data-type
var Model = require("ampersand-model");
var htmlMixin = require("ampersand-sanitized-html-data-type");
module.exports = Model.extend(htmlMixin, {
props: {
body: "html"
}
});
Alternatively, you may also define the data type under a different name:
var Model = require("ampersand-model");
var htmlMixin = require("ampersand-sanitized-html-data-type");
module.exports = Model.extend(htmlMixin, {
dataTypes: {
sanitizedHtml: htmlMixin.dataTypes.html
},
props: {
body: "sanitizedHtml"
}
});
To use different options for Gumbo Sanitize, pass the appropriate string. The
supported values are "STRICT"
, "BASIC"
, and "RELAXED"
:
var Model = require("ampersand-model");
var htmlMixin = require("ampersand-sanitized-html-data-type")("RELAXED");
To provide custom options to Gumbo Sanitize, you may pass an object:
var Model = require("ampersand-model");
var htmlMixin = require("ampersand-sanitized-html-data-type")({
secret: "xyzzy",
elements: ["i"]
});
The configuration object is deterministically serialized and its SHA-1 hash
is used as the SHA-1 HMAC key for signing the raw HTML. The cryptographic
signature allows us to cache the sanitized HTML in the database while still
allowing updates to the sanitization options to re-sanitize the raw HTML on
read. The secret
option "salts" the cryptographic signature to thward
scenarios in which the attacker has compromised the application database.
Configuration options are ignored on the client.
On the client, the unsafe HTML data type will return a blank string after an
html
property is updated with a new value. You should account for this
behavior when setting user-generated HTML values. When persisting data,
using the {wait: true}
option with ampersand-model
to ensure that the
model always returns sanitized HTML from the server, while still using the
previously sanitized HTML value.
MIT