`rmp_serde::Raw` considered harmful
Lucretiel opened this issue · 2 comments
The unsafe
behavior of Raw
is trivially shown to be unsound. str
is required to be valid UTF-8, and its methods assume that it contains valid UTF-8 for the purpose of, for example, scanning for code points. I'm able to produce UB by creating a simple Serializer that serializes strings as an array of chars, and then feeding it malformed UTF-8 data:
/// Serializer that serializes strings as a list of char
struct CharListSerializer {
output: Vec<char>,
}
impl ser::Serializer for &mut CharListSerializer {
fn serialize_str(self, s: &str) -> Result<Self::Ok, Self::Error> {
self.output.extend(s.chars());
Ok(())
}
// Other methods simply panic
}
fn main() {
let mut ser = CharListSerializer { output: Vec::new() };
let data = "ABC";
data.serialize(&mut ser).unwrap();
assert_eq!(ser.output, ['A', 'B', 'C']);
ser.output.clear();
// An emoji: 😣
let data = [0xf0, 0x9f, 0x98, 0xa3];
let data = Raw::from_utf8(data.as_slice().to_owned());
data.serialize(&mut ser).unwrap();
assert_eq!(ser.output, ['😣']);
ser.output.clear();
// Uh oh. 0xf0 signals a 4-byte UTF-8 code point,
// but there are only two bytes in this buffer.
let data = [0xf0, 0x9f];
let data = Raw::from_utf8(data.as_slice().to_owned());
data.serialize(&mut ser).unwrap();
}
When I run this normally, I get a crash:
error: process didn't exit successfully: `target\debug\rust-playground.exe` (exit code: 0xc000001d, STATUS_ILLEGAL_INSTRUCTION)
And when I run with cargo miri run
, it immediately detects the undefined behavior:
error: Undefined Behavior: entering unreachable code
--> C:\Users\Lucre\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\hint.rs:51:14
|
51 | unsafe { intrinsics::unreachable() }
| ^^^^^^^^^^^^^^^^^^^^^^^^^ entering unreachable code
|
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
= note: inside `std::hint::unreachable_unchecked` at C:\Users\Lucre\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\hint.rs:51:14
= note: inside `std::option::Option::<&u8>::unwrap_unchecked` at C:\Users\Lucre\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\option.rs:877:30
= note: inside `core::str::validations::next_code_point::<std::slice::Iter<u8>>` at C:\Users\Lucre\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\str\validations.rs:56:27
= note: inside `<std::str::Chars as std::iter::Iterator>::next` at C:\Users\Lucre\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\str\iter.rs:44:18
= note: inside `std::vec::Vec::<char>::extend_desugared::<std::str::Chars>` at C:\Users\Lucre\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\alloc\src\vec\mod.rs:2696:35
= note: inside `<std::vec::Vec<char> as std::vec::spec_extend::SpecExtend<char, std::str::Chars>>::spec_extend` at C:\Users\Lucre\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\alloc\src\vec\spec_extend.rs:18:9
= note: inside `<std::vec::Vec<char> as std::iter::Extend<char>>::extend::<std::str::Chars>` at C:\Users\Lucre\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\alloc\src\vec\mod.rs:2670:9
note: inside `<&mut CharListSerializer as serde::Serializer>::serialize_str` at src\main.rs:86:9
--> src\main.rs:86:9
|
86 | self.output.extend(s.chars());
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
= note: inside `<rmp_serde::Raw as serde::Serialize>::serialize::<&mut CharListSerializer>` at C:\Users\Lucre\.cargo\registry\src\github.com-1ecc6299db9ec823\rmp-serde-1.0.0\src\lib.rs:190:9
note: inside `main` at src\main.rs:211:5
--> src\main.rs:211:5
|
211 | data.serialize(&mut ser).unwrap();
| ^^^^^^^^^^^^^^^^^^^^^^^^
note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace
error: aborting due to previous error
error: process didn't exit successfully: `C:\Users\Lucre\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\bin\cargo-miri.exe target\miri\x86_64-pc-windows-msvc\debug\rust-playground.exe` (exit code: 1)
This type should be either:
- Removed,
- Redesigned to use
serialize_bytes
- Marked as
unsafe
and changed to require valid utf-8.
To be frank, it's not at all clear to me why this type exists in the first place? It isn't used by RMP anywhere, and I don't really understand why a user would ever be allowed to call serialize_str
on non-UTF8 data, when the entire point of that method is that valid UTF-8 can be assumed. It appears to exist for no other purpose than to create this kind of soundness hole.
My full reproduction code is available as a gist.
Thanks for the report. It is indeed unsound.
I think this functionality could be implemented without the invalid transmute. Serde doesn't support specialization directly, but it is possible to hack it by abusing newtype serialization with a custom name. This is hack is already used for ExtSerializer
injecting _ExtStruct((_,_))
. So similarly, the library could tell serde to serialize invalid UTF-8 string as _InvalidStringStruct(&[u8])
and have msgpack serializer react to this appropriately.
I don't have time to implement this workaround, and I also don't want to break existing users even if they rely on the bad implementation, so for now I'll deprecate the Raw
/RawRef
newtypes.
Any plan for a fix?