Request: Provide try_get_* methods
oblique opened this issue · 14 comments
Some network protocols have dynamic length packets, so we can not have a single buf.remaining()
check before starting parsing them. In that case before get_*
we need to check the remaining length to avoid assertions.
I believe try_get_*
methods that do not assert they will help us write cleaner code.
I agree this would be helpful to have. If you look at the rust-fuzz trophy case, many of the bugs in the oor
category could have been prevented with this type of interface.
I think we need to answer a couple of questions, first.
advance
and copy_to_slice
Ideally, we should include a non-panicking version advance
and copy_to_slice
. This would keep things consistent with not wanting to panic and require explicitly opting in to panicking, rather than opting out.
Change existing APIs
Would it be better, instead, to change the existing APIs to return Result<T, Error>
?
Pros
-
Considering a lot of the use cases for
Buf
handle untrusted input, I think this is the best direction long term, as it forces implementations to consider the possibility of a EOF. -
Adding a
try_
method perT
would essentially double the already-lengthy trait API and docs page. -
It would still be possible to panic, if desired:
buf.get_u8().expect("i've already checked the length");
Compared to the amount of boilerplate it currently requires to not panic:
// this also isn't guaranteed to be the length of `T` - easy to drift from the actual impl if buf.remaining() < 4 { return Err(EOF); } let v = buf.get_u32();
Cons
- It would most likely break a lot of existing usage of
Buf
(but now would be the time to do it before 1.0) - Panicking would require an additional
.unwrap()
Return type
In my mind, we have 3 options:
-
Option<T>
This option is attractive, as it doesn't introduce any new public types. My biggest problem with it, though, is it becomes kind of tedious to use in a function that returns aResult
:fn decode<B: Buf>(buf: B) -> Result<(), MyError> { let a = buf.try_get_u8().ok_or(MyError::UnexpectedEnd)?; let b = buf.try_get_u8().ok_or(MyError::UnexpectedEnd)?; let c = buf.try_get_u8().ok_or(MyError::UnexpectedEnd)?; Ok(()) }
whereas it's a bit easier to deal with the inverse:
fn decode<B: Buf>(buf: B) -> Option<()> { let a = buf.try_get_u8().ok()?; let b = buf.try_get_u8().ok()?; let c = buf.try_get_u8().ok()?; Some(()) }
You could implement
From<NoneError> for MyError
but that could happen anywhere for various reasons. -
Result<T, struct UnexpectedEnd>
Assuming all of the methods only check thatbuf.remaining() > size_of::<T>()
then this is a fine option. If at some point in the future, there's another error condition we want to signal, it would be a breaking change to use an enum instead. -
Result<T, #[non_exhaustive] enum TryGetError>
This is probably the most stable option long term, although may be overkill if it'll only ever be a single variant.
Result<T, struct UnexpectedEnd>
That's what I would lean towards. Since the APIs here are just about reading, I don't see a need for more error variants. Buf
doesn't cover serialization/deserialization of higher level data structures which could produce additional errors. Code which does that can define their own errors, and implement From<UnexpectedEnd>
for call to Buf
Result<T, struct UnexpectedEnd>
We use this pattern heavily in quinn and it works great. We even have an extension trait for all Buf
implementers to provide methods of that form for various integer types, plus other assorted trivially-represented stuff.
Would it be better, instead, to change the existing APIs to return Result<T, Error>?
I'm a fan of this approach, since it's overwhelmingly the way we use the Buf
API.
After much bikeshedding in Discord, I'm leaning back towards providing try_get_*
variants.
The reasoning is, while it makes a lot of sense for Buf
to provide getters that return Result
, it makes far less sense for BufMut
to provide setters to return Result
. The buffer is usually pre-allocated to contain the full message.
It also doesn't feel great for BufMut
to provide setters that panic and Buf
getters return Result. To maintain symmetry, Buf::get_[T]
can panic and we can add Buf::try_get_[T]
to return a result.
I'm also a big fan "no-panic by default". IMO, out-of-bounds panicking is one of the biggest design mistakes in std
; similar to Java's NullPointerException
s.
There are situations where panicking is the only reasonable default to handle errors, but OOB is usually none of them.
As far as I can tell, bytes
is often used in crates with highly untrusted input where you almost never want to panic just because some foreign input data is truncated. So the default API should be failsafe, because you can always still expect
a value if appropriate – but explicit is better than implicit here.
A try_
-API has two disadvantages:
First, it feels like it's a bit against Rust's design philosophy IMO? Currently most of Rust is like: "Ah nope, this is dangerous… You sure you wanna do this?", whereas an x
and try_x
dualism is more like: "Ok, will do… (Oh, and if you want me to perform some checks – tell me, ok?)".
I'm a big fan of types like Option<T>
or Result<T, E>
for fallible operations, because they allow/force me to notice that something can go wrong here (saved my ass numerous times 😅).
Also in this case, try_get_*
-APIs feel like 2nd-class citizens and are easy to forget – if you accidentally use get_*
instead of try_get_*?
, there is no compiler warning or lint etc., so it's easy to miss. Furthermore, there is already a get
and get_mut
-API for slices/sliceables which returns an Option<T>
, and it's pretty easy to confuse it with get_*
if you work with both types.
When it comes to the specific API, I'm in favor of returning an Option<T>
:
- It is consistent with
std
'sslice::get
andslice::get_mut
. - Not having any bytes left may not even be an error (e.g. if I parse an indefinite list of concatenated
u32
s). - If it is an error, you'd probably want to use your own error type anyway, because the errors are usually context specific (i.e. in some parts I'd e.g. want to return
InvalidHeader
whereas in other parts I might want sth. likeNeedMoreInput
as error).
So something likeimpl From<UnexpectedEnd> for MyError
is not very useful except in some edge cases IMO?
As mentioned in discord, I think something like this could work pretty well: https://godbolt.org/z/hjYbeEf5P.
We could also do a blanket impl: impl<B: TryBuf> Buf for B { ... }
and delegate calls with an expect(...)
. It does mean implementations would only be able to pick one trait to implement and if they've implemented Buf
, they wouldn't be able to implement TryBuf
, at least until specialization is stabilized.
in the meantime, this might do what you're looking for - https://github.com/danieleades/safer-bytes
@danburkert Your crate is exactly what I was looking for! Thank you so much for making it public.
i'm please to hear it might help. That inspired me to tidy it up a little
Another implementation is
use bytes::Buf;
use std::mem::size_of();
#[derive(Clone, Copy, Debug)]
pub enum ErrorKind {
EOF
}
pub trait TryBuf {
fn try_get_u8(&mut self) -> Result<u8,ErrorKind>;
// ...other try_get APIs
}
impl<T: Buf> TryBuf for T {
fn try_get_u8(&mut self) -> Result<u8,ErrorKind> {
if (size_of::<u8>() <= self.remaining()) {
Ok(self.get_u8())
} else {
Err(ErrorKind)
}
}
// ...other impls
}
This is efficient and will never break any existing usage of Buf
.
I implement this in try_buf
The issue with first checking that it is enough, and then reading, is that the reading part can very easily be changed by yourself or someone without updating the above check, causing a possible panic point. It is also quite error prone and unclear by hand calculating the size, E.g; having a literal 4
or 32
and not knowing why, which will be outdated if someone comes aloing and changes a get_u8
to a get_u16
or similar.
In other words, panics become a surprise because some code changed further down, which goes against the general design of Rust of not upholding invariants yourself, E.g parse and not validate, or using types rather than checks to commicate impossibilites and guarantees.
Furthermore, the manual method makes it very hard to validate if the size depends on data that has already been read, like branching code, which requires making each "section" do its own validating and manual calculation of what the code already knowns and encode. Similar in a way to hand calculating stack offsets based on function bodies, which we all know how well that goes as son as there is a branch.
Though this is just my take on the matter when writing networking code and attempting to uphold and remember implicit assumptions as functions may panic otherwise with no clear indication that they will based on signature.
Hello,
Pretty new to Rust and I just ran into this issue in late 2023; would be great to get these methods into the bytes crate. For now I've added the trait suggested by @wheird-lee into my project.
It's worked a treat, thanks 👍
Happy to help with this where I can
I think this is really needed. If you're decoding a non-trivial protocol that takes untrusted input, then it will be far superior to today's API. The type system will in effect end up enforcing safe and correct behaviour, while today's API is error prone and can lead to code that panics on bad input. You will perhaps loose a little speed, but on the other hand modern optimizing compilers are amazing at removing unnecessary checks, so this might not actually be the case.
This issue has been open for some time. If I just went ahead and wrote a "TryBuf" as a blanket implementation for Buf, what are the chances that such a PR would get accepted?
This issue has been open for some time. If I just went ahead and wrote a "TryBuf" as a blanket implementation for Buf, what are the chances that such a PR would get accepted?
I missed the open PR on this. Is there any chance of that getting merged? It looks like just what many of us have wished for. It looks complete at first glance (with the possible exception of tests), but I'll be happy to help out if needed.