`ArrayIndexOutOfBoundsException` in `CBORParser` for invalid UTF-8 String
fmeum opened this issue · 3 comments
The following Java snippet crashes with an ArrayIndexOutOfBoundsException
in CBORParser._finishShortText
:
import java.io.IOException;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.dataformat.cbor.CBORFactory;
public class JacksonCborCrash {
public static void main(String[] args) {
byte[] input = {0x66, (byte) 0xef, 0x7d, 0x7d, 0xa, 0x2d, (byte) 0xda};
CBORFactory factory = new CBORFactory();
ObjectMapper mapper = new ObjectMapper(factory);
try {
mapper.readTree(input);
} catch (IOException e) {}
}
}
The stack trace with version 2.12.1 is:
java.lang.ArrayIndexOutOfBoundsException: Index 7 out of bounds for length 7
at com.fasterxml.jackson.dataformat.cbor.CBORParser._finishShortText(CBORParser.java:2203)
at com.fasterxml.jackson.dataformat.cbor.CBORParser._finishTextToken(CBORParser.java:2170)
at com.fasterxml.jackson.dataformat.cbor.CBORParser.getText(CBORParser.java:1530)
at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeAny(JsonNodeDeserializer.java:545)
at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:74)
at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:16)
at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:322)
at com.fasterxml.jackson.databind.ObjectMapper._readTreeAndClose(ObjectMapper.java:4635)
at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:3056)
This issue appears to be caused by missing bounds checks in the cases of this switch statement.
@fmeum Thank you for reporting this, I'll need to have a look.
Ah. So this is broken encoding; the last byte is broken initial byte of 2-byte UTF-8 character, which is why illegal access is made.
Should be caught and reported of course (unexpected end of content), will need to see what's the easiest way.
Fixed for 2.12.2; 2 problems:
- Not verifying buffer boundary, matters for the specific case if the last byte implies existence one or more bytes of multi-byte UTF-8 character
- But also should verify UTF-8 encoding goodness: example case is broken (maybe it's Latin-1?); should fail earlier
Slightly worried about (2) in a patch release (and hence no backport for 2.11) since while validation really should be done, no doubt some content exist where "it used to 'work'" (i.e. butchered occasionally mis-encoded character but no one noticed, or some validation removed that garbage later on), but we'll see.
Will also file a follow-up issue for 2.13 since as of now validation of UTF-8 characters is inconsistent across code paths -- it shouldn't be, but it is. For 2.13 we can make things more strict more generally.