[security] Reliance on default encoding

Question

[security] Reliance on default encoding

duglin opened this issue 2 years ago · 3 comments

Description

Multiple instances were identified in which the getByte() standard Java API is used without specifying any encoding. Doing so causes the Java SDK to rely on the system default encoding, which can differ across platforms and systems used by event actors and cause unexpected differences in processing of event data.
The specification states that appropriate and RFC-compliant encodings MUST be followed, but the implementation in the Java SDK and documentation should be improved to highlight the importance of matching encoding across actors.
Not all observed instances are necessarily problematic, as they are handling binary data. However, this behavior should be documented and handled in the SDK implementation, documentation, and supplied examples.

import io.cloudevents.CloudEvent;
import io.cloudevents.core.builder.CloudEventBuilder;

import java.net.URI;

final CloudEvent event = CloudEventBuilder.v1()
    .withId("000")
    .withType("example.demo")
    .withSource(URI.create("http://example.com"))
    .withData("text/plain","Hello world!".getBytes())
    .build();

code

private byte[] getBinaryData(Message<?> message) {
     Object payload = message.getPayload();
     if (payload instanceof byte[]) {
       return (byte[]) payload;
     }
     else if (payload instanceof String) {
            return ((String) payload).getBytes(Charset.defaultCharset());
     }
     return null;
}

code

Exploit Scenario

The event producer, the intermediary (using the SDK), and the consumer use different default encodings for their systems. Without acknowledging a fixed encoding, the data is handled and processed using an unintended encoding, resulting in unexpected behavior.

Recommendations

Short term, improve the SDK documentation to highlight the importance of matching encoding acros actors.
Long term, review all similar instances across the SDK and improve test cases to cover handling of message and data encoding.

References

Answer 1 · 2022-11-08T19:52:10.000Z

This was opened due to the Trail of Bits security review

Answer 2 · 2022-12-08T12:13:04.000Z

@pierDipi any thoughts on this one?

Answer 3 · 2022-12-08T16:17:47.000Z

@pierDipi see #491