Protobuf data corruption when CloudEvent is serialized/deserialized several times
mishap4 opened this issue · 16 comments
Any time serialize is called the payload is packed into Any, even it was already Any:
UT:
@Test
public void testMultipleSerializationDeserialization() throws InvalidProtocolBufferException {
EventFormat serializer = new ProtobufFormat();
Status status = Status.newBuilder().setCode(Code.UNKNOWN_VALUE).build();
CloudEvent event1 = buildCloudEvent("id123", "test", "http:/test", status);
CloudEventData data1 = event1.getData();
assertTrue(data1 instanceof ProtoCloudEventData);
Message message1 = ((ProtoCloudEventData) data1).getMessage();
//assertTrue(message1 instanceof Any); // Not Any???
assertTrue(message1 instanceof Status);
assertEquals(status, message1);
byte[] raw1 = serializer.serialize(event1);
System.out.println(" event1: " + event1);
System.out.println(" raw1: " + Arrays.toString(raw1));
System.out.println(" size1: " + raw1.length);
System.out.println("payload1: " + message1);
CloudEvent event2 = serializer.deserialize(raw1);
CloudEventData data2 = event2.getData();
assertTrue(data2 instanceof ProtoCloudEventData);
Message message2 = ((ProtoCloudEventData) data2).getMessage();
assertTrue(message2 instanceof Any);
assertEquals(status, ((Any) message2).unpack(Status.class));
byte[] raw2 = serializer.serialize(event2);
System.out.println(" event2: " + event2);
System.out.println(" raw2: " + Arrays.toString(raw2));
System.out.println(" size2: " + raw2.length);
System.out.println("payload2: " + message2);
//assertEquals(event1, event2);
assertArrayEquals(raw1, raw2);
CloudEvent event3 = serializer.deserialize(raw2);
CloudEventData data3 = event3.getData();
assertTrue(data3 instanceof ProtoCloudEventData);
Message message3 = ((ProtoCloudEventData) data3).getMessage();
assertTrue(message3 instanceof Any);
assertEquals(status, (((Any) message3).unpack(Any.class).unpack(Status.class)));
byte[] raw3 = serializer.serialize(event3);
System.out.println(" event3: " + event3);
System.out.println(" raw3: " + Arrays.toString(raw3));
System.out.println(" size3: " + raw3.length);
System.out.println("payload3: " + message3);
//assertEquals(event1, event3);
//assertEquals(event2, event3);
assertArrayEquals(raw1, raw3);
assertArrayEquals(raw2, raw3);
}
static CloudEvent buildCloudEvent(String id, String type, String source, Message message) {
return CloudEventBuilder.v1()
.withId(id)
.withType(type)
.withSource(URI.create(source))
.withData(ProtoCloudEventData.wrap(message))
.build();
}
event1: CloudEvent{id='id123', source=http:/test, type='test', data=io.cloudevents.protobuf.ProtoDataWrapper@6da29a93, extensions={}}
raw1: [10, 5, 105, 100, 49, 50, 51, 18, 10, 104, 116, 116, 112, 58, 47, 116, 101, 115, 116, 26, 3, 49, 46, 48, 34, 4, 116, 101, 115, 116, 66, 43, 10, 37, 116, 121, 112, 101, 46, 103, 111, 111, 103, 108, 101, 97, 112, 105, 115, 46, 99, 111, 109, 47, 103, 111, 111, 103, 108, 101, 46, 114, 112, 99, 46, 83, 116, 97, 116, 117, 115, 18, 2, 8, 2]
size1: 75
payload1: code: 2
event2: CloudEvent{id='id123', source=http:/test, type='test', data=io.cloudevents.protobuf.ProtoDeserializer$ProtoAccessor@1ba03bde, extensions={}}
raw2: [10, 5, 105, 100, 49, 50, 51, 18, 10, 104, 116, 116, 112, 58, 47, 116, 101, 115, 116, 26, 3, 49, 46, 48, 34, 4, 116, 101, 115, 116, 66, 86, 10, 39, 116, 121, 112, 101, 46, 103, 111, 111, 103, 108, 101, 97, 112, 105, 115, 46, 99, 111, 109, 47, 103, 111, 111, 103, 108, 101, 46, 112, 114, 111, 116, 111, 98, 117, 102, 46, 65, 110, 121, 18, 43, 10, 37, 116, 121, 112, 101, 46, 103, 111, 111, 103, 108, 101, 97, 112, 105, 115, 46, 99, 111, 109, 47, 103, 111, 111, 103, 108, 101, 46, 114, 112, 99, 46, 83, 116, 97, 116, 117, 115, 18, 2, 8, 2]
size2: 118
payload2: type_url: "type.googleapis.com/google.rpc.Status"
value: "\b\002"
I believe ProtoSerializer.java should check that:
public CloudEvent end(CloudEventData data) throws CloudEventRWException {
if (data != null) {
String dataContentType = null;
Map<String, CloudEventAttributeValue> attributesMap = this.protoBuilder.getAttributesMap();
CloudEventAttributeValue attrVal = (CloudEventAttributeValue)attributesMap.get("datacontenttype");
if (attrVal != null && attrVal.hasCeString()) {
dataContentType = attrVal.getCeString();
}
ProtoCloudEventData protoData;
**if (data instanceof ProtoCloudEventData) {
protoData = (ProtoCloudEventData)data;
Message message = protoData.getMessage();
if (message != null) {
if (message instanceof Any) {
this.protoBuilder.setProtoData((Any) message);
} else {
this.protoBuilder.setProtoData(Any.pack(message));
}
}**
....
This test reproduces the issue in the SDK tests:
@ParameterizedTest
@MethodSource("roundTripTestArgumentsCloudEvents")
public void testRoundTrip(io.cloudevents.CloudEvent event) {
byte[] rawData = format.serialize(event);
byte[] bytes1 = format.serialize(format.deserialize(rawData));
byte[] bytes2 = format.serialize(format.deserialize(bytes1));
assertThat(rawData).isEqualTo(bytes1).as("roundtrip 1 vs roundtrip 2");
assertThat(bytes1).isEqualTo(bytes2).as("roundtrip 2 vs roundtrip 3");
}
public static Stream<io.cloudevents.CloudEvent> roundTripTestArgumentsCloudEvents() {
return Stream.of(
CloudEventBuilder.v1()
.withId(UUID.randomUUID().toString())
.withSource(SOURCE)
.withType(TYPE)
.withData(ProtoCloudEventData.wrap(Quote.newBuilder()
.setHigh(Decimal.newBuilder().setScale(10).build())
.setPrice(Decimal.newBuilder().setScale(10).build())
.setSymbol("$")
.build()))
.build()
);
}
@pierDipi, could you please take care on the fix integration. I believe you have more knowledge about the project and nothing is broken...
I can try and take a look at this being one of the original authors :-(
Other observation. When you create an event and get the data, it is Message, after serialization/deserialization it is Any(Message). Not consistent behavior.
Thx @mishap4 - Out of interest if you compare the "CloudEvent' objects as opposed to the serialized form does that still fail .. not that it should make much difference but ......
It's starting to come back to me now ... with respect to the data handling there's no guarantee that a receiving application/intermediary will be able to hydrate the actual ProtoBuf message from the data on reception, that's why the 'Any' construct is returned so that the business app can do the appropriate type mapping if needed, or pass it on elsewhere if required.
Granted this does mean the 'create' and 'read' aren't entirely symmetrical ..
Maybe I don't understand something, but spec says that for protobuf format CloudEvent should contain:
oneof data {
// Binary data
bytes binary_data = 2;
// String data
string text_data = 3;
// Protobuf Message data
google.protobuf.Any proto_data = 4;
}
I was assuming once CloudEvent is created with a Message it is packed into Any implicitly (or you give Any to the builder), so getter should return Any(Message) no matter whether this event still on sending side or receiving one...
That's a fair point ... I've addressed the original issue so it doesn't corrupt any more.
I'm happy to make the other change so that it's symmetrical, theoretically one might consider that a behavioral change but I can go ahead and do that as well in this fix.
@pierDipi - Thoughts... I'm kind-of leaning toward making the other change now, any immediate concerns ?
For breaking API changes we need a new major version published, on main we have already a few breaking changes [1], so the next version will be 3.0 even tho the actual release payload is small and most users won't notice anything.
What I would do is:
- Fix the issue in a non breaking way (and afaiui as of now #524 is not a breaking change), so that we could backport the fix to 2.4.
- Fix the other issue separately for 3.0
[1] https://github.com/cloudevents/sdk-java/milestone/13?closed=1
What do you think?
In addition, once we backport the minimal fix to 2.4.x, I can also mention in the release notes the known issue of the other change that we plan to have for 3.0
Agree ... let's merge this as-is and address the consistency issue seperately.
2.4.2 has been released with this fix https://github.com/cloudevents/sdk-java/releases/tag/2.4.2
That was a super quick resolution. Thank you!