darylldoyle/svg-sanitizer

Requesting more details on GHSA-xrqq-wqh4-5hg2 (CVE-2023-28426)

Closed this issue · 10 comments

It seems that tag 0.16.0 did not actually fix a real security vulnerability, see corresponding advisory GHSA-xrqq-wqh4-5hg2 (CVE-2023-28426).

The provided new test SVG files in commit cce18bc still seem to be fine (encoded correctly) even when being processed with the previous version 0.15.4 of this library.

Invoked as svg.svg in browser, mime-type image/svg+xml

<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" xml:space="preserve">
    <form action="javascript:alert('1')">
        <input type="submit"></input>
    </form>
    
        &lt;form action="javascript:alert('1')"&gt;
            &lt;input type="submit" onclick="javascript:alert('1')"/&gt;
        &lt;/form&gt;
    
</svg>

→ the <form> tag does not look nice, but is without any functionality inside an SVG context
→ the nested <form> tag in a CDATA section is correctly encoded (this is what the security advisory is referring to)
→ not a vulnerability

Invoked as svg.html in browser, mime-type text/htm

<html>
<body>
<div>
    <svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" xml:space="preserve">
        <form action="javascript:alert('1')">
            <input type="submit"></input>
        </form>
        
            &lt;form action="javascript:alert('1')"&gt;
                &lt;input type="submit" onclick="javascript:alert('1')"/&gt;
            &lt;/form&gt;    
        
    </svg>
</div>
</body>
</html>

→ the <form> tag does not look nice, but is without any functionality inside an SVG context
→ the nested <form> tag in a CDATA section is correctly encoded (this is what the security advisory is referring to)
→ not a vulnerability

Conclusion & Post-review

  • disallowing non SVG tags from being processed (like the <form> tag) was handled in #74 and commit 355a65d - which was totally fine to be adjusted, but did not qualify as a security vulnerability
  • the remaining aspect of encoding CDATA sections was handled in a previous fixes already, see issue #71
  • introducing the package ezyang/htmlpurifier seems to be superfluous in this scenario

Disclaimer & Call for Action

@ohader I can see your point, the <form> issue may not be able to be executed but the background CSS issue does still stand.

If you run the following through 0.15.4, with the following settings, the JS will get executed.

<?xml version="1.0" standalone="yes"?>
<svg width="128px" height="128px" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1">
    <style type="text/css">
        <![CDATA[
            s {
                background : "<textarea></textarea><iframe/srcdoc=&lt;script/src=data:,try{parent.document.write(&#x27;\x3cb\x3eLocation:\x3c/b\x3e&#x27;+parent.location+&#x27;\x3cbr\x3e\x3cb\x3eUserAgent:\x3c/b\x3e&#x27;+navigator.userAgent+&#x27;\x3cbr\x3e\x3cb\x3eCookie:\x3c/b\x3e&#x27;+document.cookie+&#x27;\x3cbr\x3e\x3cb\x3eLocalStorage:\x3c/b\x3e&#x27;+JSON.stringify(localStorage)+&#x27;\x3cbr\x3e&#x27;)}catch(e){parent.document.write(e.message)}&gt;&lt;/script&gt;321></iframe>";
            }
        ]]>
    </style>
</svg>

Attempt to sanitize it with the following settings:

<?php
$sanitizer = new enshrined\svgSanitize\Sanitizer();
$sanitizer->removeRemoteReferences(true);
$sanitizer->minify(false);
?>
<html>
    <body>
        <?php echo $sanitizer->sanitize(file_get_contents('svg.svg')); ?>
    </body>
</html>

This is similar to what I was running previously on https://svg.enshrined.co.uk/ where the vulnerability was reported.

@darylldoyle Thanks for your feedback.

I'm referring to the green tests in PR #89 and https://github.com/darylldoyle/svg-sanitizer/pull/89/files#diff-a24025b3103aa5c0c40acacec37ad73bfa7621ff439a9984f7bc0e97ce78dfbf (tests/data/cdataTwoClean.svg) as shown in the screenshot (I cannot link the source diff view of a PR directly on GitHub):

Screenshot 2023-03-21 at 16 44 17

I observed the same behavior with our TYPO3 CI tests (which is still using v0.15.4 of the library):
https://review.typo3.org/c/Packages/TYPO3.CMS/+/78193/2/typo3/sysext/core/Tests/Functional/Resource/Fixtures/CleanSVG/cdataTwoTest.svg

The process at https://svg.enshrined.co.uk/ is wrapping the SVG result in a <textarea> block, where the correctly encoded SVG contents are visualized differently, but are not executed at all - example:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Textarea</title>
</head>
<body>
<textarea rows="30" cols="80">

<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="128px" height="128px" version="1.1">
  <style type="text/css">
            s {
                background : "";
                background : "&lt;textarea&gt;&lt;/textarea&gt;&lt;iframe/srcdoc=&amp;lt;script/src=data:,try{parent.document.write(&amp;#x27;\x3cb\x3eLocation:\x3c/b\x3e&amp;#x27;+parent.location+&amp;#x27;\x3cbr\x3e\x3cb\x3eUserAgent:\x3c/b\x3e&amp;#x27;+navigator.userAgent+&amp;#x27;\x3cbr\x3e\x3cb\x3eCookie:\x3c/b\x3e&amp;#x27;+document.cookie+&amp;#x27;\x3cbr\x3e\x3cb\x3eLocalStorage:\x3c/b\x3e&amp;#x27;+JSON.stringify(localStorage)+&amp;#x27;\x3cbr\x3e&amp;#x27;)}catch(e){parent.document.write(e.message)}&amp;gt;&amp;lt;/script&amp;gt;321&gt;&lt;/iframe&gt;";
            }
        </style>
</svg>

</textarea>
</body>
</html>

Visualized like in the screenshot below, but that's just the text-content representation - no IFRAME instance is generated in the DOM (see the document.querySelector('iframe') which resolves to null):

Screenshot 2023-03-21 at 17 07 03

@ohader yep, I see what you mean. The issue with https://svg.enshrined.co.uk/ was that the <textarea></textarea> at the start of the background definition was breaking out of the TextArea that the clean SVG contents were output into, and that was then executing the <iframe> script within the HTML context of the page.

This does look like it's a false positive, and actually, there would be no issue reverting the HTMLPurifier addition.

Is the goal here just to reject the CVE, or is there a more significant reason to revert the HTMLPurifier addition? The library is pretty small.

In case we can guarantee that CDATA sections are correctly encoded - and I think they are, due to the changes of PR #72 which also had been confirmed back then by the reporter of CVE-2022-23638 - then using ezyang/htmlpurifier seems to be superfluous to me.

This library also has some side effects when trying to generate caches in read-only file systems, e.g. Warning: Directory ~/typo3/main/vendor/ezyang/htmlpurifier/library/HTMLPurifier/DefinitionCache/Serializer not writable, please chmod to 777 in ~/typo3/main/vendor/ezyang/htmlpurifier/library/HTMLPurifier/DefinitionCache/Serializer.php on line 297.

Rejecting the CVE helps those that currently are "blocked" by e.g. roave/security-advisories since they are still on v0.15.4 of this library. Thus, marking the CVE as invalid should also revert advisories like https://packagist.org/packages/enshrined/svg-sanitize/advisories?version=5989941 or GHSA-xrqq-wqh4-5hg2 - which are used as source for GitHub's Dependabot and similar tools as well.

Thus, for me the desired approach really would be like implemented in PR #89:

  • revert ezyang/htmlpurifier, but keep the changes for non-SVG tags
  • release a new version v0.16.1 of this library
  • reject CVE-2023-28426

However, I'm also curious to see a short statement from the reporter @Cyxow concerning these findings.

A web archive search for https://svg.enshrined.co.uk/ shows a version from November 2022. It seems the <textarea> containing the "Dirty SVG" was not properly encoded back then and has been adjusted recently on the website. Most probably that was the XSS that has been discovered - thus, correct, there was a XSS on the website(!) in the "Dirty SVG" - however the output of "Clean SVG" passed through the svg-sanitizer was fine.

Open https://web.archive.org/web/20221104235950/https://svg.enshrined.co.uk/ and check the HTML source of the "Dirty SVG" section, it was not encoded back then. That was the trigger... 😉

Cyxow commented

Hello, @ohader! I agree, it was my mistake, I really see that the vulnerability is really only on the site. Thank you for your attention to this report. Also, I also think that it would be right to refuse CVE.

Thanks for all of your feedback, I've requested an update to the GitHub advisory database and to reject the assigned CVE.

The test probably should be something like this:

<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" xml:space="preserve">
  <desc>
    <form xmlns="http://www.w3.org/1999/xhtml" action="javascript:alert('1')">
        <input type="submit"/>
    </form>
  </desc>
</svg>

This would have the form in the DOM that, if submitted, would execute the script in action. Since desc is not rendered, the user can't access the button. However, depending on what the page is doing where this is injected, you can still make the form be submitted by clobbering an id. Let's say the victim page is a discussion board where posting a new comment involves clicking a "Comment" button:

<form id="commentform"></form>
...
<button form="commentform" type="submit">Comment</button>

or

<form id="commentform">
...
<a href="#" onclick="commentform.submit()" class="button">Comment</a>
</form>

If the injected SVG is earlier in the DOM and the form also has id="commentform", the user clicking "Comment" will submit the injected form and execute the attacker's script.

For CDATA, that's not an issue in XML context but can still be an issue in HTML context since the parsing rules are different in different places in HTML, which makes it possible to bypass. For example:

<svg xmlns="http://www.w3.org/2000/svg">
  <desc>
    <style><![CDATA[</style><img src onerror="alert(1)">]]></style>
  </desc>
</svg>

desc is an HTML integration point, so <style> above is parsed as an HTML style element (where CDATA sections are not supported).

Oh for CDATA you previously replaced with a text node. That should be safe also in HTML, no need to sanitize the text node.

Oh for CDATA you previously replaced with a text node. That should be safe also in HTML, no need to sanitize the text node.

Yepp, I also think it's fine. It has been changed in PR #72 and you've been commenting there as well, back then... 😄