FasterXML/jackson-core

Add configurable limit for the maximum number of bytes or chars that we will parse in a number

pjfanning opened this issue · 6 comments

Number parsing can be expensive and is not O(n) - a string with twice the number of digits as another string takes more than twice as long to parse.

A malicious actor might craft dangerous a JSON payload with very long numbers.

Idea would be to have a default limit - eg 1000 chars/bytes and to allow users to configure a bigger limit. A setting maybe on JsonFactory class.

Relates to #813 - @cowtowncoder @plokhotnyuk this might be a more achievable v2.14 work item than the other new issues I raised.

I think this would fall under #637 as another case?

I guess this is point 4 in #637 but in practise, it would be easier to treat 637 as an epic and to have subtasks to implement the individual tasks.

I guess this will need to be done within the various JSON parser classes (I think there are 4) because you have access to the JsonParser config there and the earlier the issue is caught the better. Would JsonFactory be a good place to put the config?

Right. And not just JSON parsers, many limits should apply to other formats too.

As to configuration, I think there's probably need for base class (for likely shared limits, across most formats) and probably format-specific subtypes.

Or perhaps it'd be more practical NOT to create general limit system and only start with JSON.

Either way, yes, JsonFactory/TokenStreamFactory would be the place. And limit class should be immutable, builder-style.

As to Epic/tasks; yes, I guess. Could have one setup task for initial wiring, although that is likely to be done as part of implementing first limits.

Woodstox has something similar for XML, for what that is worth. Not sure how useful it is, but for limits it imposes I found that the easiest places to add checks tended to be places where buffer boundary is reached (need to increase buffer). This avoids having to do check for each and every character for example.

@cowtowncoder I had a look at adding a setting to TokenStreamFactory (inherited by JsonFactory) but accessing the factory settings in the JsonParser instance will require quite a bit of plumbing. I think getting the maxNumLen value into the TextBuffer would allow for some useful checks there.

JsonFactory has this

    protected JsonParser _createParser(InputStream in, IOContext ctxt) throws IOException {
        try {
            return new ByteSourceJsonBootstrapper(ctxt, in).constructParser(_parserFeatures,
                    _objectCodec, _byteSymbolCanonicalizer, _rootCharSymbols, _factoryFeatures);

It's sort of a pity that constructParser doesn't just take the JsonFactory as a param as opposed to passing through all those instance variables.

  • I could add an extra param to constructParser and deprecate the existing version of constructParser
  • Or I could add a setMaxNumLen to JsonParser and call that setter after constructParser but before returning the parser instance.
  • Or I could modify ObjectCodec or one the existing to set the maxNumLen there so constructParser could pass on the maxNumLen that way

Which of these approaches make the most sense to you? Or would you have an alternative approach that could be used?

I have a very rough draft in #827

Ok, I'll add same note(s) as in PR: I think we should have immutable configuration value objects (like StreamReadConfig, StreamWriteConfig?), constructed using builder() (defaulting to some sane values). This would then be passed to parser/generator (accessed from builder). This would reduce number of changes when we add new settings, avoid API changes wrt plumbing.
These types should also allow sub-classing on per-format basis. Probably should use "self-type" approach for builders but not values.
Finally, parser/generator should not have setters, unless 2.x absolutely requires one. Could expose getter, might be useful.

I can help with the implementation.

Note that it's perfectly fine to require use of Builder-style construction for new functionality like this (for 2.15), just cannot retrofit older configurability. But we may -- if we want to -- also expose setter(s) for JsonFactory if that makes sense. But try to keep parser/generator instances as immutable as possible.

Fixed via #827, need to slightly modify release notes to point to this issue.