developit/snarkdown

Document or Prevent XSS Security Issues

FlorianWendelborn opened this issue · 7 comments

E.g. <img src onerror="alert(1)"/> will execute arbitrary JavaScript. It would be good to either document this very explicitly or to prevent this security issue from ever happening.


Update: Non-HTML Vulnerability: #70 (comment)

Hey, could you explain it with a little more detail? I don't see how this has to do with this markdown-parser. It is parsing some input and this <img src onerror="alert(1)"/> is valid markdown. Maybe one could add this to the readme.md.

@najtin sure. Basically, my point is that very often, if markdown is used in real-world apps, it’s used to parse user-generated content (like these comments we’re writing here). Most developers don’t explicitly go through OWASP’s list of potential security pitfalls every single time they implement anything.

So, what will most likely happen is that someone will use this library and not expect it to allow JavaScript to be executed when using the output. If this library doesn’t want to implement an explicit cross-site-scripting preventing mechanism, it should at least have a warning that implementing such a mechanism is always necessary when parsing and rendering user-generated markdown content.

Otherwise, developers will find a way to mess this up and risk their users’ and company’s security and image.

Other markdown parsers mention these issues in their readme or have (sometimes too simple) ways of mitigating them themselves, like disabling all HTML.

@najtin sure. Basically, my point is that very often, if markdown is used in real-world apps, it’s used to parse user-generated content (like these comments we’re writing here). Most developers don’t explicitly go through OWASP’s list of potential security pitfalls every single time they implement anything.

So, what will most likely happen is that someone will use this library and not expect it to allow JavaScript to be executed when using the output. If this library doesn’t want to implement an explicit cross-site-scripting preventing mechanism, it should at least have a warning that implementing such a mechanism is always necessary when parsing and rendering user-generated markdown content.

Otherwise, developers will find a way to mess this up and risk their users’ and company’s security and image.

Other markdown parsers mention these issues in their readme or have (sometimes too simple) ways of mitigating them themselves, like disabling all HTML.

Agree with you. This is really important.
The library should either explicitly state that the parser does NOT protect against XSS or implement a XSS feature itself.

Referring to other libraries in the README that help with XSS would be a nice addition too. There are a number of client side and server side solutions out there that fit the tiny & fast mantra of Snarkdown.

There are a number of client side and server side solutions out there that fit the tiny & fast mantra of Snarkdown.

Can you give some examples of smaller client side libs that help with xss?

@bboydflo not sure if it's helpful, but my original reason for using this library was to "pipe" its output to Preact, which requires using DOMParser to convert the generated HTML to Virtual DOM. Since this is then rendered using the imperative DOM API, it's relatively easy to implement XSS mitigation, though the same concept can be applied as a string-to-string transform. It won't be the fastest, but it avoids building in an HTML parser just for sanitization:

function safeMarkdown(markdown) {
  const html = snarkdown(markdown);
  const doc = new DOMParser().parseFromString(`<!DOCTYPE html><html><body>${html}`, 'text/html');
  doc.normalize();
  _sanitize(doc.body);
  return doc.body.innerHTML;
}
function _sanitize(node) {
  if (node.nodeType === 3) return;
  if (node.nodeType !== 1 || /^(script|iframe|object|embed|svg)$/i.test(node.tagName)) {
    return node.remove();
  }
  for (let i=node.attributes.length; i--; ) {
    const name = node.attributes[i].name;
    if (!/^(class|id|name|href|src|alt|align|valign)$/i.test(name)) {
      node.attributes.removeNamedItem(name);
    }
  }
  for (let i=node.childNodes.length; i--; ) _sanitize(node.childNodes[i]);
}

Here's the above running on JSFiddle: https://jsfiddle.net/developit/vrn16fsg/

retog commented

Removing all HTML from the markdown before passing it to snarkdown, would this render the output safe? Or could one cause a similar output by another valid markdown?

@retog your question made me curios, so I investigated it a bit.

Unfortunately, I just managed to XSS snarkdown without using HTML:

https://codesandbox.io/s/immutable-cloud-b66z48?file=/src/index.ts

In general, XSS prevention isn’t that easy. Here’s a somewhat complete list of prevention techniques: https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html

Since snarkdown has no plans of being XSS-proof I’d strongly recommend not using snarkdown for any user-provided data, and only for trusted markdown files.


in case codesandbox 404s, here’s the source code

import md from "snarkdown";
import e from "lodash.escape";

const t = "[XSS Me](javascript:alert`hello from xss`)";

document.getElementById("xss")!.innerHTML = md(t);
document.getElementById("txt")!.innerHTML = e(md(t));
<div id="txt"></div>
<div id="xss"></div>