streem/pbandk

Parsing of the enum field causes exception thrown internally

Opened this issue · 4 comments

There is a code in the JsonValueDecoder.readEnum which tries to decode enum field as integer first, which in most cases, when provided enum value is their name, causes the JsonDecodingException to be thrown but immediately caught and suppressed by mapExceptionsToNull { ... }. Only then decoder tries to deserialize enum field by name. JsonValueDecoder.writeEnum says that enum may be serialized as int in case of unrecognized value, and this is understandable, but is quite a rare case, isn't is?

fun readEnum(value: JsonElement, enumCompanion: Message.Enum.Companion<*>): Message.Enum? = try {
        val p = value.jsonPrimitive
        p.intOrNull?.let { return enumCompanion.fromValue(it) }  // 1. decode as int first
        require(p.isString) { "Non-numeric enum values must be quoted" }
        enumCompanion.fromName(p.content)                        // 2. the decode as name
    } catch (e: Exception) {
        ...
    }

But creating and throwing an exception on every enum field while running a "hot" code with many calls to deserialization leads to bad performance. Shouldn't the JsonValueDecoder.readEnum try deserialize enum by name first because this is a most common case? This will completely exclude throwing the JsonDecodingException when it's obviously known that your system operates with enum names but not unrecognized values.

@Brikman out of curiosity, have you measured the performance impact?

JsonValueDecoder.writeEnum says that enum may be serialized as int in case of unrecognized value, and this is understandable, but is quite a rare case, isn't is?

I agree that it's the less common case. But I wouldn't assume it's rare: enums can be serialized as int not only when they're unrecognized but also when someone explicitly uses the printingEnumsAsInts() option in protobuf-java. Pbandk doesn't currently have a similar option when serializing enums to JSON, but presumably protobuf-java added this option because some subset of users prefers to use ints instead of strings for serializing all enums.

I faced this problem actually exploring the cause of a bad performance of my application. I have a hot code which deserializes a lot of JSONs with a high rate (reading SQL dataset with lots of rows from the database where one column is a JSONB, deserialized by pbandk), and that fragment of code was pretty slow. JFR analysis showed me a very high rate of throwing the JsonDecodingException.

Then I tried to use Jackson and temporary DTO with similar structure and custom name-based enum deserializer, and the same code showed much better performance (up to x2 faster).

So, even if Jackson becomes faster, which in regular case must be slower than pbandk, I think, performance impact becomes pretty significant.

Thanks for the extra context @Brikman. In that case I agree that this code should be rewritten to avoid using exceptions in the common path. I'd gladly accept a PR if you're up for it. Otherwise, I'll work on this when I have a chance, but I can't promise when that'll be.

I should add that the JSON support in pbandk has mostly focused on correctness so far. The binary encoding is going to be more performant. High performance JSON decoding is definitely a goal, and of course there are use cases that require JSON. I just haven't had time to focus on it yet. That said, suggestions for specific improvements like you are bringing up here are always appreciated, and are more likely to be implemented sooner.