Kafka deserialization of messages that have `datacontenttype=application/json` is inconsistent between structured and binary encodings
alexec opened this issue · 2 comments
alexec commented
Consider the following 2 tests:
package com.intuit.event.gateway.ingestor.rest.services.eventsources;
import static org.junit.jupiter.api.Assertions.*;
import io.cloudevents.jackson.JsonCloudEventData;
import io.cloudevents.kafka.CloudEventDeserializer;
import java.nio.charset.StandardCharsets;
import org.apache.kafka.common.header.internals.RecordHeaders;
import org.junit.jupiter.api.Test;
class KafkaEventSourcesTest {
@Test
void binaryEncoding() {
try (var d = new CloudEventDeserializer()) {
var h = new RecordHeaders();
h.add("ce_specversion", "1.0".getBytes());
h.add("ce_type", "test".getBytes());
h.add("ce_source", "my-source".getBytes());
h.add("ce_id", "my-id".getBytes());
h.add("ce_datacontenttype", "application/json".getBytes());
var e = d.deserialize("", h, "{}".getBytes(StandardCharsets.UTF_8));
assertInstanceOf(JsonCloudEventData.class, e.getData());
}
}
@Test
void structuredEncoding() {
try (var d = new CloudEventDeserializer()) {
var h = new RecordHeaders();
h.add("content-type", "application/cloudevents+json".getBytes());
var e = d.deserialize("", h, ("{" +
"\"specversion\":\"1.0\"," +
"\"type\":\"test\"," +
"\"source\":\"my-source\"," +
"\"id\":\"my-id\"," +
"\"datacontentype\":\"application/json\"," +
"\"data\":\"invalid-json\"" +
"}").getBytes(StandardCharsets.UTF_8));
assertInstanceOf(JsonCloudEventData.class, e.getData());
}
}
}
They should both pass. Yet the first one fails and the second one passes.
How come? Why this inconsintency? Why don't we get JsonCloudEventData
for both?
pierDipi commented
That's because, in general, the SDK doesn't inspect or deserialize data (if we don't need to).
you get a JsonCloudEventData
object only when cloudevents attributes are in a JSON (aka JSON structured format + a JSON datacontentype) and it is mainly to optimze the fact that otherwise we would force users to double deserialize JSONs (one time to extract CE attributes and one second time to deserialize the JSON data).
alexec commented
That answers my question.