remarkablemark/html-react-parser

Does html-react-parser strip out XSS?

dave-stevens-net opened this issue ยท 15 comments

I'm wanting to use html-react-parser to sanitize and parse HTML from my CMS. Does it effectively sanitize the input from XSS attacks? https://stackoverflow.com/questions/29044518/safe-alternative-to-dangerouslysetinnerhtml#answer-48261046 claims that it does. If so, I think it would be great to document / advertise this somewhere in the README. Thanks for your work on this.

Great question @dave-stevens-net!

Unfortunately it doesn't. The reason is because I chose to make this library flexible rather than strict.

Although there is the replace option, checking against all possible attacks may be too much. I recommend instead using an XSS sanitizer with dangerouslySetInnerHTML.

Good to know. Thanks for the quick response.

You're very welcome. If this answers your question @dave-stevens-net, can the issue be closed?

@dave-stevens-net I may have misspoke earlier about this library not being XSS safe.

I originally thought this library wasn't XSS-safe because dangerouslySetInnerHTML was relied here.

However, it seems that I'm unable to reproduce any XSS vulnerabilities. See my fiddle, which is based off of this example.

Let me know if you have any luck in reproducing XSS attacks.

I managed to reproduce a simple XSS attack. There might be more.

Check my fiddle.

I found it in here https://www.in-secure.org/misc/xss/xss.html

I ended up coding a Sanitize component using the sanitize-html package dependency.

import React from 'react'
import sanitizeHtml from 'sanitize-html'

const Sanitize = ({ html }) => {
    const clean = sanitizeHtml(html, {
        allowedTags: sanitizeHtml.defaults.allowedTags.concat(['img', 'span']),
        allowedAttributes: {
           ...
        },
    })
    return (
        <span
            className="sanitized-html"
            dangerouslySetInnerHTML={{ __html: clean }}
        />
    )
}
export default Sanitize

Example usage:

<Sanitize html={data.wordpressPage.title} />

@harveydf Great find! Thanks for creating and sharing the fiddle.

I'll update the README.md to note that this library isn't XSS safe.

I didn't want to use sanitize-html, because it's massive. I used dompurify instead, it's 10 times smaller, and doesn't remove CSS.

import parse, { domToReact } from 'html-react-parser'
import DOMPurify from 'dompurify'
import React from 'react'

// export function replaceNode() {}

export default function html(html, opts = {}) {
  return parse(DOMPurify.sanitize(html), {
    ...{
      replace: replaceNode,
    },
    ...opts,
  })
}

html('<iframe src=javascript:alert("xss")></iframe>')

Thanks for sharing your approach using dompurify @k1sul1!

I created a Repl.it demo based on your example.

I managed to reproduce a simple XSS attack. There might be more.

Check my fiddle.

I found it in here https://www.in-secure.org/misc/xss/xss.html

Hey I know this is a pretty old comment but I just wanted to point out that this isn't actually an XSS issue since the JavaScript is running within the iframe. If you change the html to <iframe src=javascript:alert(location.href)></iframe>, you'll see that the URL it's running on is about:blank rather than the host page.

In the replace function, you can check domNode.name... so wouldn't it be inherently not possible to embed a script tag or iframe there if you just check if (['script', 'iframe'].includes(domNode.name)) return null ?

@alexgleason there are many other ways to do XSS without <script> or <iframe>. For example:

<a onmouseover="alert()">xss</a>

Take a look at https://cheatsheetseries.owasp.org/cheatsheets/XSS_Filter_Evasion_Cheat_Sheet.html

Ahh... that makes sense.

What I'm really trying to figure out is if this library is any worse than dangerouslySetInnerHTML. Is there a new attack surface outside of what's already possible with dangerouslySetInnerHTML?

@alexgleason you should treat this library the same as dangerouslySetInnerHTML if you didn't sanitize the HTML string

Thank you for clarifying. A friend of mine got burned by this one earlier this year, so now I am extra paranoid:

@graf does btrfly support pleroma <a href='\r\nd&#x61t&#x61:text/html,<scr&#x69pt></scr&#x69pt\" src=\"https://i.poastcdn.org/b2977f2d97f598d2ebd6dcf37afd9047b5da2b6dc95a7b2824fb111c906fb117.js\" hidden'></a>

Fortunately I can't reproduce the attack using this library. I just gave it a try.

They were using a custom HTML parser that was vulnerable. This library seems to use the browser's DOMParser when it's availble. Therefore, I conclude it's no less secure than using dangerouslySetInnerHTML directly.