ruby/psych

Add settings for alias limits, recursion, and duplicate keys

headius opened this issue · 8 comments

In #613 and #647 I exposed settings in SnakeYAML Engine for limiting code points, aliases, recursive keys, and duplicate keys.

I was able to add one test that worked, for the code point limit setting. I could not get the others to error, and I think I've figured out why: they are not done at the parser level.

The maximum alias detection, for example, which is used to prevent the "billion laughs" attack, is done above the parser in SnakeYAML Engine as part of composing YAML nodes for the rest of the library. JRuby's Psych backend bypasses the composer/node API and uses the parser directly, with the node-wrangling logic living in the rest of Psych.

https://bitbucket.org/snakeyaml/snakeyaml-engine/src/8fc70e5d943a24ed9eb94eb333a4da7ca79e62c1/src/main/java/org/snakeyaml/engine/v2/composer/Composer.java#Composer.java-187:194

I suspect this is the same situation for the duplicate and recursive keys, but hopefully @asomov can tell us if this is the case.

If so, then three out of the four settings I added are not really being used by SnakeYAML Engine, and they should be changed into Psych-level settings. However I am not sure how the C version of Psych detects and prevents these situations. Need clarification from @tenderlove or @hsbt.

Alternatively, it may make sense for the SnakeYAML Engine parser itself to honor these settings, but that would be up to @asomov.

@oliverbarnes Ping! I believe this is why I could not get the other settings to work properly. I'm glad the one you needed is hooked up, though!

@tenderlove @hsbt I would like your input here. In #647 I added the ability to set the SnakeYAML defaults for four settings, but it appears only one of those four is actually checked in the parser (maximum code points). The others appear to be done as part of SnakeYAML's node graph building, such as the alias limit I mention in this issue's description.

In order for these settings to be useful, one of two things would need to happen:

  • SnakeYAML Engine would need to check them during parse, rather than during tree building or other higher-level parts of the API as it does now; OR
  • Psych would need to add equivalent checks within its own tree-building etc logic. (Probably the more logical place since it would bring these limits to all Psych users).

Either of these would require work beyond my expertise, so I don't expect them to happen soon. And that leaves us with three JRuby-specific settings that don't really do anything (they do, but they don't affect execution of the parts of SnakeYAML Engine that Psych uses).

I'm thinking it would just be best to remove the other settings until such time as they make sense to add again. Then we could get a release out with the code point setting available. We keep getting bug reports about it, so I'd like a Psych update we can ship in 9.4.4.0 by the end of this month.

@tenderlove @hsbt Thoughts?

I've pushed #653 which removes the ineffective settings.

hsbt commented

I'm +1 to remove them.

@headius yes we should remove them. The default API (YAML.load) doesn't allow aliases or recursive data structures by default, the user has to specifically opt-in. We could theoretically add alias / depth limiting to the AST -> Ruby pass, but I'm not sure if it's necessary.

@oliverbarnes Ping! I believe this is why I could not get the other settings to work properly. I'm glad the one you needed is hooked up, though!

thanks for the heads up @headius! I'm glad there'll be a resolution for these settings for 9.4.4.0, good to know

asomov commented

Unfortunately, I do not know know how to change SnakeYAML Engine - some checks can only be done when the Node tree is created.

@tenderlove Sounds good. Perhaps the code point limit could be added as an additional DoS measure, but I'll go with what I have to remove the inoperable settings and hopefully we can release that soon.