spring-cloud/spring-cloud-function

JsonMapper.isJsonStringRepresentsCollection may incorrectly return true for Protobuf-encoded messages

simonzn opened this issue · 10 comments

Using
Spring Boot 2.7.11
Spring Cloud Dependencies 2021.0.8

Describe the bug
Note: We ran into this issue with Protobuf encoded messages, but it should be reproducible with any binary serialization.

A Protobuf message with length encoded fields like

syntax = "proto3";

message StringValue {
  string value = 1;
}

is encoded into a tag, followed by the length of the encoded value (in bytes), followed by the payload. The tag is fieldNumber * 8 + 2 (see Protobuf wire types) = 10 in this example.

isJsonStringRepresentsCollection() strips the tag from the input if it has the same numeric representation as some whitespace character (newline in this case).

If the payload happens to be 34, 91, or 123 bytes long and end with 34 resp. 93 resp. 125 (decimal representation), isJsonStringRepresentsCollection() identifies the remaining input as JSON and fluxifyInputIfNecessary() calls fromJson(), which inevitably fails at some point with something like

Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal character ((CTRL-CHAR, code 26)): only regular white space (\r, \n, \t) is allowed between tokens
 at [Source: (byte[])"
[
\u001A\u0008�\u000F\u0010
\u0018\u0013 \u0007(\u001C0\u000E8���\u0001J\u0005
\u0003UTC\u0012\u0003\u0008�s\u0012\u0004\u0008�\u0002\u0012\u0004\u0008�\u0002\u0012\u0004\u0008��\u0001\u0012\u0004\u0008��\u0001\u0012\u0003\u0008�s\u0012\u0004\u0008��\u0001\u0012\u0004\u0008��\u0001\u0012\u0004\u0008��\u0001\u0012\u0004\u0008��\u0001\u0012\u0003\u0008�]"; line: 3, column: 2]
    at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:2391)
    at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:735)
    at com.fasterxml.jackson.core.base.ParserMinimalBase._throwInvalidSpace(ParserMinimalBase.java:713)
    at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._skipWSOrEnd(UTF8StreamJsonParser.java:3077)
    at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:756)
    at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer._deserializeFromArray(CollectionDeserializer.java:346)
    at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:244)
    at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:28)
    at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323)
    at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4674)
    at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3723)
    at org.springframework.cloud.function.json.JacksonMapper.doFromJson(JacksonMapper.java:60)
    at org.springframework.cloud.function.json.JsonMapper.fromJson(JsonMapper.java:94)
    at org.springframework.cloud.function.context.catalog.SimpleFunctionRegistry$FunctionInvocationWrapper.fluxifyInputIfNecessary(SimpleFunctionRegistry.java:833)
    at org.springframework.cloud.function.context.catalog.SimpleFunctionRegistry$FunctionInvocationWrapper.doApply(SimpleFunctionRegistry.java:733)
    at org.springframework.cloud.function.context.catalog.SimpleFunctionRegistry$FunctionInvocationWrapper.apply(SimpleFunctionRegistry.java:590)
    at org.springframework.cloud.stream.function.PartitionAwareFunctionWrapper.apply(PartitionAwareFunctionWrapper.java:84)
    at org.springframework.cloud.stream.function.FunctionConfiguration$FunctionWrapper.apply(FunctionConfiguration.java:791)
    at org.springframework.cloud.stream.function.FunctionConfiguration$FunctionToDestinationBinder$1.handleMessageInternal(FunctionConfiguration.java:623)
    at org.springframework.integration.handler.AbstractMessageHandler.handleMessage(AbstractMessageHandler.java:55)
    ... 26 more 

The real culprit here seems to be fluxifyInputIfNecessary(), which ignores the content type header of a message entirely (almost, text/plain is handled). IMO fromJson() should only be called if the content type is undefined or application/json.

What contentType are you passing as message header?
It would be nice to have some small reproducible sample.

We use the ProtobufMessageConverter, which sets the content type to application/x-protobuf

However, as mentioned above, this is not limited to protobuf-encoded messages. From code inspection it's pretty clear that any serialization that is not JSON can cause that problem.

This is becoming a real problem for us. I have added a test that reproduces the issue.

Sorry for neglecting it. Will look at it shortly

olegz commented

While I added mapper.configure(DeserializationFeature.FAIL_ON_TRAILING_TOKENS, true); it broke #1148.
So I am removing it, and instead you can manually configure it as such

spring:
  jackson:
    deserialization: 
      fail-on-trailing-tokens: true

I need to ask why we have to configure a JSON mapper if we don't use JSON for our messages at all. The content type says the message is not JSON, so IMHO isJsonString() shouldn't be called in the first place.

With this "auto-sensing" of the content type we can probably still construct a failing scenario, i.e. a serialization that sometimes produces a valid JSON string, but isn't meant to be interpreted as JSON.

What am I missing?

olegz commented

No, you are not missing anything, but got me thinking if I can also check for content-type and if it is not application/json why bother . . . that I agree, so let me try something. I'll re-open the issue

Thanks for reopening this - we already found that the whole thing is skipped if the content type is plain text (https://github.com/spring-cloud/spring-cloud-function/blob/main/spring-cloud-function-context/src/main/java/org/springframework/cloud/function/context/catalog/SimpleFunctionRegistry.java#L857). Maybe the simplest fix would be to reverse this logic and call the JsonMapper only if the content type is undefined oder "application/json".

olegz commented

The problem is that not everything comes as Message, hence no way of communicating content-type.
I'll figure something out.
Can you help me though and attach a byte array representing one of your payloads so I can have a meaningful test .

olegz commented

Anyway, I have reverted the commit that would force you to provide the mentioned settings as I agree you don't need to do that when no using JSON.