XSS in markdown renderer

Split out of #353

The markdown implementation is vulnerable to XSS. When viewing a markdown file, arbitrary scripts from this file can be executed. As an example, go to https://sheep.solidcommunity.net/public/ and click on the xss.md file, which will call print() as a XSS demonstration. The attacker then has the same privileges as mashlib (likely read, write and control).

A POC markdown file that will call print() when it is displayed:

# Hello

<img src=1234 onerror=print() />

A relevant comment from @bourgeoa :

SoliOS is using https://github.com/markedjs/marked. They recommend to sanitize using https://github.com/cure53/DOMPurify

This should be moved to the sourcePane repository.

Whenever you create a new text file, mintNew: function inside of sourcePane.js is called to create/mint the file in question.

The code I am working on will utilize the DOMPurify library.

However, there are some future concerns.

Even tho this will fix future XSS storage exploits, how can we sanitize the already existing files on users pods?
Essentially those who are grandfathered in. You will have unsanitize files resting on pods, unless an some kind iterative approach can be used to query through each user's list of items in the storage area to sanitize them.

I think this issue should stay here, as the problem lies in rendering the markdown file as unsanitized html rather than saving the markdown file. The steps for an exploit are:

Evil user creates xss.md on the pod (including <img src=1234 onerror=print() />)
User opens xss.md in mashlib
Mashlib fetches markdown and converts it to html
Mashlib inserts html to document
The inserted html executes print(), as image loading fails (1234 is not an existing image file)

We can't control step 1. The evil user can create the markdown file with mashlib, solid-filemanger, a node script, or probably even directly appending to the users inbox. So this doesn't need to happen through mashlib. Or, as you already pointed out, the files could be already existing on the pod.

However, we can sanitize the html at step 3. So if there is evil markdown, we convert it to html and remove the evilness before showing it to the user.

The conversion step is in the solid-panes repository:

solid-panes/src/humanReadablePane.js

Lines 89 to 99 in 6cc8fb8

    
           // render markdown to html 
        
           const markdownHtml = function () { 
        
             kb.fetcher.webOperation('GET', subject.uri).then(response => { 
        
               const markdownText = response.responseText 
        
               const lines = Math.min(30, markdownText.split(/\n/).length + 5) 
        
               const res = marked.parse(markdownText) 
        
               frame.innerHTML = res 
        
               frame.setAttribute('class', 'doc') 
        
               frame.setAttribute('style', `border: 1px solid; padding: 1em; height: ${lines}em; width: 800px; resize: both; overflow: auto;`) 
        
             }) 
        
           }

So between line 94 and 95 the sanitization should happen.

@Otto-AA thanks
@chunt007 the sanitization process here only concerns rendering markdown to html and not the markdown source.In the future the chat should also render markdown and sanitization should apply.

@Otto-AA thank you for additional information. I have the fix in place ready to go. rendering/sanitizing it simultaneously before it becomes a source makes sense. However we can control Step 1 at the source as well. The moment they push the Edit/Continue/Save button, it can be sanitized. But I guess that wouldn't matter much if a new user visits a malicious storage area, because the new humanReadablePane would have the sanitation option and sanitize the rendering before it loads into their browser. @bourgeoa thank you.

However we can control Step 1 at the source as well. The moment they push the Edit/Continue/Save button, it can be sanitized.

To some extent yes and no. If you sanitize it when saving the file, you prevent users from saving malicious markdown through mashlib. This won't prevent the exploit I've described above, because malicious users will just take another tool to save the markdown file. It would only prevent users from accidentally creating malicious markdown files with mashlib (eg by copying random malicious markdown they found on the internet).

So sanitizing at step 3 is still necessary. But of course, sanitizing at step 1 and also step 3 is also a good solution 👍

Great brainstorming on this topic of discussion. I really like the angles here. It just shows how vulnerable things can be. It only fuels the fire of passion for much needed work on the security sides of things.

	// render markdown to html
	const markdownHtml = function () {
	kb.fetcher.webOperation('GET', subject.uri).then(response => {
	const markdownText = response.responseText
	const lines = Math.min(30, markdownText.split(/\n/).length + 5)
	const res = marked.parse(markdownText)
	frame.innerHTML = res
	frame.setAttribute('class', 'doc')
	frame.setAttribute('style', `border: 1px solid; padding: 1em; height: ${lines}em; width: 800px; resize: both; overflow: auto;`)
	})
	}