jruby/jruby

org.yaml.snakeyaml.error.YAMLException: The incoming YAML document exceeds the limit: 3145728 code points.

Closed this issue · 12 comments

donv commented

Environment Information

jruby 9.3.9.0 (2.6.8) 2022-10-24 537cd1f OpenJDK 64-Bit Server VM 17.0.5+8-LTS on 17.0.5+8-LTS +jit [x86_64-darwin]
Darwin 21.6.0 Darwin Kernel Version 21.6.0: Sun Nov 6 23:31:16 PST 2022; root:xnu-8020.240.14~1/RELEASE_X86_64 x86_64

Expected Behavior

Using Psych::Parser to parse a large yaml file succeeds.

Actual Behavior

Using JRuby 9.3.9.0 parsing a large yaml file results in an exception:

 org.yaml.snakeyaml.error.YAMLException: The incoming YAML document exceeds the limit: 3145728 code points.

Reverting to jruby-9.3.8.0 works.

This is due to #7388 and the 3MB code point limit now honored in SnakeYaml 1.32 ( https://bitbucket.org/snakeyaml/snakeyaml/wiki/Changes )

Without looking at the specifics, suspect this might require ruby/psych#579 to workaround by setting
https://javadoc.io/static/org.yaml/snakeyaml/1.32/org/yaml/snakeyaml/LoaderOptions.html#setCodePointLimit-int-

donv commented

Yeah, this looks right.

Any way to set the limit before ruby/psych#579 is done?

Apart from some hardcore runtime bytecode manipulation, I don't think so. Sadly the LoaderOptions defaults appear hard coded and without any external way to change the default values JVM-wide (e.g static default holders, system properties).

I have added some of these methods in ruby/psych#613. This is a unilateral exposure of these properties only in the JRuby version, so we should try to work with the maintainers of the C extension and see if we can have the same API for both.

I've requested review for my changes in ruby/psych#613. I would also like to release psych 5.1 to incorporate the new SnakeYAML Engine in ruby/psych#612, so this is a good time to do it.

Looks like ability to control this was merged in ruby/psych#613, released in Psych 5.1.0 which is part of JRuby 9.4.1.0 in #7626 so if that is all that is required on JRuby side, we might be able to update the milestone here (currently 9.4.2.0) and close this unless it needs backport)?

Also not sure if it needs some docs somewhere to show how to override them in normal use cases.

Oops, yup, this one should have been resolved as of 9.4.1.

There are no docs for the new features and no tests. Perhaps you could come up with some? I must admit I do not know exactly what YAML constructs the various settings apply to.

The irony is that my/our use case of JRuby actually doesn't rely on YAML parsing, Psych or SnakeYAML at all - it's just that we use jruby-complete and I like reducing noise for the community from CVEs. Some other things I work on have direct SnakeYAML exposure so was familiar with some of the noise/risks in the area and interested in the overlap with JRuby world.

I wonder if @donv has something on the test side.

Where do you suggest the docs would sit? Within Psych? Jruby itself somewhere?

donv commented

Hi!

Documenting the usage here since I could not find it anywhere else yet.

@parser = Psych::Parser.new
@parser.code_point_limit = 20_000_000

I have switched to JRuby 9.4.1.0 in development and it seems to work fine.

@donv thanks for confirming!

@chadlwilson Well that is a good question. There is no logic in rdoc to generate documentation from the Java extension, and any way we don't have the same feature in the C extension so putting it in the general psych docs would not work anyway.

@hsbt @tenderlove How should we handle this? Maybe we can collaborate to get these same config options supported in the C extension?