remorhaz/php-json-path

Remote code execution weak spot

Closed this issue · 9 comments

jlcd commented

As per #22, there is an eval function being called that is leaking memory. It seems to be the default PHP behavior when using eval.

But I wanted to point out that using eval is really bad practice and would like to ask you to kindly consider improving this portion of the code.

And this particular eval could lead to remote code execution depending on how someone uses this class.
To give an example of how this may be used to remotely execute code is this:
Input: $queryFactory->createQuery('$.a.b[467]')
Would generate the following code to be executed:

return function (Remorhaz\JSON\Path\Value\NodeValueListInterface $input, Remorhaz\JSON\Path\Runtime\ValueListFetcherInterface $valueListFetcher, Remorhaz\JSON\Path\Runtime\EvaluatorInterface $evaluator, Remorhaz\JSON\Path\Runtime\LiteralFactoryInterface $literalFactory, Remorhaz\JSON\Path\Runtime\Matcher\MatcherFactoryInterface $matcherFactory) : Remorhaz\JSON\Path\Value\ValueListInterface {
    $var6 = $matcherFactory->matchElementStrictly(467);
    $var4 = $matcherFactory->matchPropertyStrictly('b');
    $var2 = $matcherFactory->matchPropertyStrictly('a');
    $var3 = $valueListFetcher->fetchChildren($input, $var2);
    $var5 = $valueListFetcher->fetchChildren($var3, $var4);
    $var7 = $valueListFetcher->fetchChildren($var5, $var6);
    return $var7;
};

Now, if someones relies on this lib to path through a json file, without properly sanitizing the user input, this could lead to someone exploiting the $queryFactory->createQuery() parameter that is passed through to eval as matchPropertyStrictly parameters.

For reference, I'm talking specificaly about

$callback = eval($this->getCallbackCode());

Not sure if there are other eval uses in other parts of the code.

Let me know your thoughts about this.

Well, eval() is not a bad practice by itself; the bad practice is to eval user input. In our case, the query is compiled into PHP code after parsing the query into the AST to avoid overhead on subsequent query calls - just like any other compiler does. CallbackBuilder accepts AST tree, generated by query parser, and generates output from PHP AST using well-tested and widely used for code generation nikic/php-parser. I see no way user input can get into the eval() unfiltered. If you do, please provide some example.

As far as I know, PHP provides us exactly two ways to execute generated code:

  1. using include on a generated file;
  2. using eval on a string, which works exactly as (1) under the hood: PHP just saves the eval'ed string in temporary file and includes it.

If you know another way to execute generated PHP code - please let me know. It's not an irony - PHP evolves very quickly, I could miss something.

Not sure if there are other eval uses in other parts of the code.

It's the only case for sure. I would never use eval() without a really strong reason.

an eval function being called that is leaking memory

It's not an eval/include itself, it's including the closure that leaks:
https://bugs.php.net/bug.php?id=76982

If that's the case, I can try to replace the closure with a dynamic class with magic __invoke method. If that won't work - there is also a radical way: to replace eval with some table finite-state machine; but that will significantly reduce the speed of executing the query and will bring unnecessary complexity to the library.

Well, anonymous class leaks the same way as closure, that's sad. I have one more idea for a quick fix, though...

jlcd commented

Well, anonymous class leaks the same way as closure, that's sad. I have one more idea for a quick fix, though...

Great, let me know, I even tried runkit7 without luck, as it also suffers from the same issue.

jlcd commented

As per

If you know another way to execute generated PHP code - please let me know. It's not an irony - PHP evolves very quickly, I could miss something.

I'm not aware of any other strong ways, except maybe runkit (extension) and create_function (deprecated), but they do about the same eval does.

Please try version 0.7.5. I've moved closure from the generated code; now we have to call eval each time the query is evaluated (and that is still faster than hand-made FSM probably), but there's no more closure in evaluated code and thus the memory leak should go away. Please let me know if that's true.

jlcd commented

Seems to be leak-free now 👍
I'll run more tests and, if it leaks somehow, Ill let you know.

Thanks heaps! :)

Nice to hear that, implementing FSM in the middle of the night didn't sound like a great idea :)

I'm just curious: for what purpose did you you need such amount of JsonPath queries?

jlcd commented

Nice to hear that, implementing FSM in the middle of the night didn't sound like a great idea :)

I'm just curious: for what purpose did you you need such amount of JsonPath queries?

Well, tbh I'm not planning on running all theses millions of queries in one running process (although we will have a few thousand), and will move to dedicated workers next week. I'm creating a crawler to extract data from some websites and they return a json response, so I created an easy way to map each field I'm interested in and use JsonPath for it. During the development phase I usually let the whole site to be crawled, so I can find edge cases, and this is where the memory leak was getting in the way.

Seems to get fixed in 0.7.5.