cshaa/filtrex

Filtrex fails CSP

Opened this issue · 9 comments

Hi folks,

We are happily using filtrex for quite some time but we see some of our clients complain that the filtrex code is unsecure becuse it uses new Function.

See https://stackoverflow.com/questions/52573756/what-are-the-eval-related-functions-to-be-avoided-when-csp-is-enabled for instance.

Is that something that can be fixed? I guess more and more people will run into this with filtrex eventually as more organisations start to use more strict CSP rules.

Cheers,
Danny

cshaa commented

Hey, Danny!
Thanks for getting in touch!

Internally, filtrex works by parsing the expression and immediately translating it to the corresponding JavaScript code. However, there is no way for the user to directly enter any JavaScript code at all – the resulting JS code is always composed of pre-made chunks of code (eg. this) and properly escaped user input (eg. this). The only thing that passes right into the JS code unescaped are numbers, but the allowed syntax there has been chosen very conservatively. Therefore, personally I wouldn't be too worried about an XSS.

That being said, everybody makes mistakes, and even if I didn't, that wouldn't change company policies :) There are several ways I can think of that would allow filtrex not to use new Function:

  • By far the simplest solution would be to pass the JS code of the compilled expression directly to the developer. Then, they could pass it to their favorite JS sandbox (see this SO post). This would make everything much slower (by 2 to 3 orders of magnitude in the case when compile time is negligible when compared to run time), but if one prefers to sacrifice performance just to be extra safe, this could be the way to go. This is how a sample project would look then:
import { compileExpression } from 'filtrex';
import { safelyCompileJS } from 'my-favorite-js-sandbox';

let options = {
    compileJavaScript = (args, src) => {
        const vm = safelyCompileJS(`function result(${args}){${src}}`)
        return vm.getVariable('result')
    }
};

const f = compileExpression(`category == "meal"`, options)
  • Another solution would be to rewrite filtrex, so that it doesn't produce JS code, but instead an AST (Abstract Syntax Tree). Then, there would be a filtrex interpreter that reads the AST and computes the result. No JS code, no need for new Function. I don't like this solution, because it would be hard to make, hard to maintain and slower than the current implementation (for everybody, not just people who need to be extra safe).

  • Yet another solution would involve some functional-programming trickery. Since the resulting resulting JS code is composed of pre-made blocks of code, in theory, we don't need to produce a JS source code. Instead, the parser could just spit out working JS functions right away. This would be potentially faster than the current implementation (no need to compile JS), but I think there's a risk of making the code less readable (and therefore more prone to bugs and vulnerabilities). Also, this solution would make it harder to run filtrex on a worker or in the aforementioned JS sandbox – one would have to load the entire filtrex library in the worker/sandbox and parse the expression there.

I'd love to hear your take on this :)
If we decide to go with the first option, I think I can implement it myself in a short period of time. The other two options would require some prototyping and I could use a helping hand there.

Yeah, it is complicated indeed. I don't have a solution either. It's just that people look at their CSP evaluators and say: 'hey, this piece of code is not safe so we cannot allow it.' We have some software using filtrex and a government agency ran this check and complains that our software is not safe enough blablabla. All because somewhere in the code there is a new Function...

cshaa commented

@dannybloe Curious question: do your clients check whether new Function actually runs? Or is the occurence of new Function anywhere in the code enough for them to distrust the software?

I think they just did a simple search. We ship code with a piece that could theoretically be harmful. It is very hard to prove that a certain piece of code is ever executed of course. Same that it is almost impossible to prove some piece of code ends. At least that's what I learned in some computer science class millenia ago 😉

cshaa commented

Which means that in order for Filtrex to pass CSP, it can't have new Function by default and other options for paranoid users... 🤔 So the first option wouldn't work and only the other two remain.

Well, and things should definitly not become slower of course so that would also rule out option 1.

I think it would be possible to do both option 3 and also do issue #40 at once by rewriting to peggy, and write the grammar such that it builds a series of nested functions that can be passed the arbitrary object

I haven't actually tested it yet, but at least theoretically there shouldn't be much of a performance hit thanks to automatic inlining performed by modern javascript engines

I have had a lot of fun using pegjs in the past, so I'll try and take a stab at this on my own fork in the next weeks/month or two and let you know how it goes


const parser = peggy.generate(`
start
  = boolean

boolean
  = left:identifier "==" right:identifier { return (data) => left(data) === right(data) }

identifier
  = word:[a-z]* { return (data) => data?.[word.join("")] }
`)

compiledExp = parser.parse("foo==bar")

console.log(compiledExp({
  foo: 1,
  bar: 2
}))

console.log(compiledExp({
  foo: 1,
  bar: 1
}))```

Update on my previous comment having now taken a stab at this, as I unfortunately won't be able to follow through on an implementation -

This approach would pretty much entail a ground-up re-write of the library, which is unfortunately more than I've got in me at this time.
That said, I ended up taking the previously described approach at work to make a faster implementation of an existing internal proprietary (albeit much simpler) filter language and it worked pretty well, so it definitely could work here, though some of filtrex's more lexicographically advanced features might get a little tricky to parse with the PEG grammar

Yeah. I totally understand.