[security] Reliance on default encoding
duglin opened this issue · 3 comments
Description
Multiple instances were identified in which the getByte() standard Java API is used without specifying any encoding. Doing so causes the Java SDK to rely on the system default encoding, which can differ across platforms and systems used by event actors and cause unexpected differences in processing of event data.
The specification states that appropriate and RFC-compliant encodings MUST be followed, but the implementation in the Java SDK and documentation should be improved to highlight the importance of matching encoding across actors.
Not all observed instances are necessarily problematic, as they are handling binary data. However, this behavior should be documented and handled in the SDK implementation, documentation, and supplied examples.
import io.cloudevents.CloudEvent;
import io.cloudevents.core.builder.CloudEventBuilder;
import java.net.URI;
final CloudEvent event = CloudEventBuilder.v1()
.withId("000")
.withType("example.demo")
.withSource(URI.create("http://example.com"))
.withData("text/plain","Hello world!".getBytes())
.build();
private byte[] getBinaryData(Message<?> message) {
Object payload = message.getPayload();
if (payload instanceof byte[]) {
return (byte[]) payload;
}
else if (payload instanceof String) {
return ((String) payload).getBytes(Charset.defaultCharset());
}
return null;
}
Exploit Scenario
The event producer, the intermediary (using the SDK), and the consumer use different default encodings for their systems. Without acknowledging a fixed encoding, the data is handled and processed using an unintended encoding, resulting in unexpected behavior.
Recommendations
Short term, improve the SDK documentation to highlight the importance of matching encoding acros actors.
Long term, review all similar instances across the SDK and improve test cases to cover handling of message and data encoding.
References
This was opened due to the Trail of Bits security review