Type-safe futures and streams
Kixunil opened this issue · 25 comments
As discussed in #206, current poll API of futures and streams takes them by mutable pointer. That would suggest those are always valid after call to poll()
, which is not the case, since poll()
shouldn't be called after future resolves.
Consuming self and returning it optionally would make more sense from type-safety perspective. It would also simplify implementations, especially when Future::Item
is not Clone
and must be stored in the future itself before resolving. (The current solution is to have it inside Option
and unwrap it each time.)
I've made a demonstration example of some concepts.
Pros of type-safe futures/streams
- No undesired behaviour (panicking, infinite blocking...) when
poll()
is called on resolved future. - No unnecessary checks (code bloat as well as performance)
- Usage errors found during compile-time as opposed to run-time
- The usage is clear from documentation
- Resolves #206. Stream which can end with fatal error is represented with
Stream::Error = SomeError
. Stream that doesn't fail fatally is represented withStream::Error = !
.
Cons of type-safe futures/streams
- Lack of object safety. This can be solved with
UnsafeFuture
trait as demonstrated in my example. Someone might consider that solution ugly. - Performance of calls to
poll()
decreases if futures are big and compiler didn't inline, since it involves copying due to pass-by-value. This might be solved with enough#[inline]
s. Also, I've invented a hack to get pass-by reference, yet type-safe behaviour. I didn't implement it yet and didn't publish anything about it anywhere. I probably will some day. It doesn't look very nice/intuitive though. :( - @alexcrichton mentioned some ergonomic issues when used inside struct fields, however he didn't provide an example and I don't understand the issue without it.
- Need to rewrite stuff.
- This area is probably unexplored and we don't know what else might come out.
Motivation
I believe the type-safe futures might be useful in cases where safety is more valued. E.g. financial applications, mission-critical software, etc.
As demonstrated in my example, the type safe and type unsafe futures/streams can co-exist using Glue
wrapper. So maybe the best solution is to have both version and use the best tool for the job. I'd like to explore this topic and know what do you think about it.
Note: Just found that #462 would be probably avoided if the Sink
was type safe from the beginning.
FWIW I feel "type safe" is the wrong name for this. That typically invokes thinking along the lines of "if you're not type-safe then you can store a String
inside &mut i32
" which clearly futures do not allow.
I personally feel the cons outweigh the pros, and have yet to be convinced of this change.
Maybe I have a different understanding of what "type safe" means. From what I understand, it also means "you can't store &[u8]
inside &str
even though they have same representation". In the same spirit you should't store resolved future inside unresolved future. Anyway, I don't want to argue about names right now and I don't have an idea for a better name meaning "type system-enforced invariants". If you have some, I'd appreciate it.
Well, I'm not convinced about "one perfect way of doing it" either. And I certainly don't want to throw away all the existing code. I believe both approaches could co-exist with their pros and cons. What I want is to explore possibilities of statically checked code.
So to address your concerns, you mentioned three:
- Lack of object safety. I found a way to solve this specific thing. Do you find my solution good enough or do you think there are other issues with that?
- Ergonomic issues. I still don't understand this well enough. I get that you're busy with all the things trying to make yourself compile-able ;) , so I don't mind if, instead of explaining, you point me to some (potentially historical) piece of code displaying that issue.
- Performance. This concerns me too. I guess it might be solved by putting things behind FutureObject, or using
Glue
, inline as much as possible and pray that it will be optimized enough. Maybe even exposeUnsafeFuture
and let people build another safe abstractions? Maybe lower performance is still better in safety-critical applications. (BTW, I'm currently developing financial software, so that's my motivation.)
Could you, please, describe whether you see some problems with my suggestions for solutions? I'd love to solve it. :)
Sure yeah, I'll dig into some more details.
So it sounds like you've got object safety taken care of, but this definitely comes at a cost. There's an extra trait and more concepts to understand (just to be clear), but at a technical level it's solved. For ergonomics much of what I'm thinking is somewhat anectodal. We prototyped a system exactly like this (without object safety) where futures were all moved by value, and that's only one incarnation of the system!
I find it difficult to articulate as to why it was so unergonomic, but in general Rust, while moving by default, typically favors borrowing in terms of ergonomics. Almost all idioms of the standard library heavily rely on borrowing to make methods, helpers, etc work out nicely. Tons of code in the ecosystem relies on borrowing in very colorful ways as well. In general it means that idioms are typically not geared towards lots of movement of ownership and ergonomics tend to suffer once movement is forced on everyone. Movement has also been explored a lot in other systems like rotor. I did not personally use rotor, however, so I can't comment myself.
In general I'm all for type systems upholding invariants where possible, but I personally at least favor ergonomics over the "type system getting in my way". For example the net2
crate does not use type state to track the state a socket is in, so you can make invalid calls at runtime. In practice, however, this rarely happens. Similarly with futures in practice you never actually poll a future after it's done. The executor (e.g. tokio-core) is the one typically managing this, so it's rarely something you need to worry about. As a result with the ergonomics/simplicity not on par with &mut
it's not worth it to me.
For performance I'd recommend profiling pieces of code here and there written both ways. IIRC Hyper received a double-digit percent boost in performance moving from rotor to futures. I don't know whether that was movement related or runtime related, but it's at least one data point.
Sorry, I was unable to reply sooner, I had a lot to do. Thank you for deep explanation!
You made very good points. I'm still dreaming about solving everything and having a perfect code. :) But it seems like it makes sense to prefer borrowing at least for now. I've been thinking about other ways to solve this but they seem ergonomically worse.
That being said, I'd very much like to improve implementation of Future
trait. One thing I had in mind was that since only reactors actually call poll
it could be unsafe
to allow move-out but that has many side-effects. Maybe if it would be possible to allow correct move-out only that would be solution.
Another thing that might be useful for some cases is equivalent of T!
types in Swift. (Basically it's like Option with Deref(Mut)
which panics on None
.) If you know of existing crate implementing it, I'm all ears. If not, this is probably the approach I could take in my project.
Of course for performance I'd definitely use profiling. I'm not concerned with performance too much, because it is important but not critical for me yet. But I can understand other people caring about it a lot.
I think I'm gonna skip worrying about this too much right now but I'd like to revisit this in the future (no pun intended), when I'll have more time.
Edit, one more thing: I feel a little bit sad every time I see (panicking) checks in code just to prevent memory bugs if programmer screwed up. So that's also one of the reasons I was exploring this. That is probably just my perfectionism...
This reminds me a bit of what @dtolnay did with Serde. I believe it was in 0.9 when the serialization system changed to move a sentinel value to indicate when serialization was done. Personally, I really like the idea of the type system being used to ensure panic-free code. "In practice, this rarely happens" is the kind of thing people would say back in the dynamic programming world, and it usually ended up meaning you'd get screwed eventually. Of course if the performance cost of that approach is too great, maybe it's not feasible.
FWIW I'd be very curious to hear about scenarios where this actually caused problems. I don't think in practice I've ever seen panics or behavior due to poll-after-future-is-done except maybe once or twice. I'm probably not the best data point though because I'm familiar with all the internals!
An interesting data point is the proposed coroutine language feature, which uses &mut self
and panics unconditionally if you resume a finished generator. (basically the same as how Future
works today)
It hasn't caused "problems" for me.
Receiving self
instead of &mut self
would simplify Futures and Streams implemented as state machine enums (I assume the coroutines proposal is the same), since they won't need to do the std::mem::replace
dance while computing a state transition, and won't need an explicit Invalid
variant just to std::mem::replace
the initial state with. I've seen existing implementations reuse the terminal Ended
state (and even did it myself) instead of an explicit Invalid
state that maps to unreachable!()
in the state transition (helps catch bugs where a state transition never happened before poll()
returned). But I'm not very keen about that, since forgetting to do a state transition may get masked by an early-terminating Future / Stream.
This isn't a particularly big deal for me though. The more common mistake I make is to accidentally not call the inner Future / Stream's poll()
during a state transition and thus end up with a stuck Future / Stream. Whether poll()
takes self
or &mut self
doesn't help with that.
I had an idea how to get performance of &mut self
with safety of self
but at the cost of ergonomics. If someone wants fast && safe at all costs, I'm willing to write it down.
Here's a scenario I recently encountered where linear typing would have helped: Implementing a handshake protocol.
This is what I wanted to implement: There's a Handshaker
struct, which contains a field stream
of some type S: AsyncRead + AsyncWrite
. Handshaker::new(stream: S, cryptostuff: &Foo)
takes ownership of a stream: S
. Taking ownership ensures that it is impossible for any other code to read or write on the stream during the handshake.
The actual handshake is performed via Handshaker::shake_hands(self) -> Result<(NegotiatedSecrets, S), (HandshakeError, S)>
. So whether the handshake succeeds or fails, ownership of the stream is returned. The API is useable, ensures that no incorrect reads/writes can mess with the handshake, all is well. (Actually the error type is somewhat simplified, you'd want to handle non-fatal errors without returning ownership of the stream. But these types suffice for the example.)
How should the corresponding async API look like? Instead of synchronously returning a Result<(NegotiatedSecrets, S), (HandshakeError, S)>
, just implement Future<Item = (NegotiatedSecrets, S), Error = (HandshakeError, S)>
. But poll
only takes a &mut self
, so you can not move self.stream
.
So in my current implementation, the Handshaker
struct has a method into_inner(self) -> S
and the future implementation uses only Item = NegotiatedSecrets
. The API consumer has to consume the future and then use handshaker.into_inner()
to retrieve the stream for further usage. Not only is this needlessly error-prone, but it also hinders composition:
Say there is a read/write wrapper type CryptoWrapper
for S: AsyncRead + AsyncWrite
. Its constructor takes an S
and a NegotiatedSecrets
. It behaves just like the wrapped stream, but it encrypts all writes and decrypts all reads. It's a oneliner to implement a convenient, synchronous method Handshaker::negotiate_secure_stream(self) -> Result<CryptoWrapper<S>, (HandshakeError, S)>
. But its not directly possible to implement Future<Item = CryptoWrapper<S>, (HandshakeError, S)>
. You can't call self.into_inner()
on a &mut self
. Ownership gets in the way again, even though I know that the future will yield a value at most once.
When I saw sink.send()
today, I got excited because it manages to provide this sort of API. So I took a look at the implementation and was disappointed, since it needs runtime checks and panics. Which lead me to this issue.
As for the performance cost: poll(self)
provides more information to the compiler than poll(&mut self) // by the way, this should never be called twice
. So in theory, shouldn't the more accurate typing allow better reasoning and thus better optimizations? Even if there would be a performance loss with the current compiler, it should eventually become the more efficient solution.
And if "need to rewrite stuff" is an actual blocker, I'd happily contribute some of my time.
I'm personally pretty tempted to close this issue. I think we've got lots of data which points in favor of where "in the small" a by-self API can provide an ergonomic and statically checked win. Once futures start scaling up, however, this becomes much less clear and I think by-value self
becomes much more unergonomic. I have a related comment about API conventions about close
for this.
@AljoschaMeyer it's true that &mut self
requires panicking in some situations, but I've never actually seen a panic in practice and I'm personally more worried about the scalability of the Future
trait rather than maximal ergnomics in the "simple cases", where I think &mut self
clearly shows benefits over by-value self. For example I can't really imagine writing this function with by-value-self and threading that through everywhere.
For example I can't really imagine writing this function with by-value-self and threading that through everywhere.
That function can accept mut self
, use it as &mut self
internally and return Async::NotReady(self)
at the end.
The reverse (using owned self
in a function that receives &mut self
) is less straightforward and requires tricks like the take_mut
crate.
That's not really the point, though. The Future
trait is the absolute core of the async ecosystem, and the idioms that it sets forth will end up getting used almost everywhere. A by-value self
method is a strong message that you should also be using by-value self everywhere. If all "real use cases" of a Future
would immediately rather switch to &mut self
that's a clear signal to me at least that it's the wrong default.
Well, I personally can't imagine either of those - that implementations of poll
would have some compulsion to use self
in every function they call from poll
, or that there would be no (non-trivial) implementation of poll
that would benefit from moving owned self
over &mut self
. But you've been involved with Rust and futures
much more so you have more experience about this.
I'm going to share my idea, so it won't be lost, although I doubt someone will want to actually implement it (but who knows?).
So the idea was to take slot: S
where S: Slot<Item=Self>
instead of self
or &mut self
.
trait Slot {
type Item;
fn get_mut(&mut self) -> &mut Self::Item;
fn take(self) -> Self::Item;
}
struct OptionSlot<'a, T: 'a> {
// This always point at Option containing Some
option: &'a mut Option<T>
}
impl<'a, T: 'a> OptionSlot<'a, T> {
fn new(option: &mut Option) -> Self {
match *option {
Some(_) => OptionSlot { option },
None => None,
}
}
}
The impl<'a, T: 'a> Slot for OptionSlot
would simply unsafely unwrap the option (because it must always be Some
)
Then the caller is responsible for checking invalid states using OptionSlot::new()
. I could imagine implementing Slot
for some cursor over vector of futures, so when one calls take()
that future is removed and thus can't be polled later. This would make the API panic-free without the performance cost of moving self
.
The obvious downside would be seemingly complicated API and lack of object safety (It may be possible to restore it somehow, but I didn't figure it out yet.)
I'd be very happy to see someone figure it out. I don't have much time for this. :(
This inspired me to do a valuable_futures crate (I think the name is funny, no?).
What it does is it has another Future
trait that receives self by value. You can't box this thing but you can convert it to a normal future via into_future
. Also since I think it's mostly useful for writing futures by hand it doesn't have combinators (but we may add them in future).
There is also a wrapper that passes both mutable pointer and by-value state. I have a lot of code should be simpler if designed this way. (If you have some suggestion for naming this thing, please open an issue in that repository)
I'll try it on some real futures soon, but here are examples:
- Using enum as future
- Using normal future for the same task
- Using Supply (a by-value future that has mutable state stored separately)
While the code is approximately of the same length and complexity, there are few tiny but imporant things:
- It's easy to forget some returns statements or mutations. While writing the example I had this line missing and got an infinite loop. And yes I've done similar errors in real applications. They are very annoying and hard to debug unless are captured by unit tests.
- It's easy to miss repeating the whole operation on timeout poll (it's needed AFAIU because timeout can return Ready on the first poll too). It might also be some other future that's easy to forget to poll. In the state machine approach you basically return new state and always poll it again to activate futures encompassed by it.
(you might question why Print
is a state, while it never returns NotReady
, the point here is to show a simple example, usually there are multiple states with some embedded futures, still such no-op steps do improve code readability sometimes)
It's just quick prototype to try and gather some comments. Docs and implementation for Stream will be done later.
@tailhook Do I understand correctly that you reinvented the wheel or am I missing something?
@tailhook Do I understand correctly that you reinvented the wheel or am I missing something?
Didn't see that. Have you given a link? (also not at crates.io)
As I see now:
- The result of
poll()
in your implementation isn'tResult
so I can't use?
operator (or is it already possible in stable rust?) - There is some unsafe code, that I have hard time to follow. I think
take().unwrap()
and converting tofutures::Future
(including for creating aBox<Future>
) do the work good enough. - Also I'm not sure rust optimizes
Glue
as well asOption<X>
(i.e. the NonZero magic) - And no Supply primitive (which by the way was idea flying around since rotor times)
(well, also usage examples would be useful)
Have you given a link?
In the top post in this thread 🤣. It's PoC code.
- It was specifically designed to use
Try
trait in the future.Result
as designed infutures
crate seems weird to me anyway, becauseAsync::Ready
is usually the thing one is interested in. unsafe
isn't something to be feared of. There will always beSome
. The only case when there'sNone
is if thepoll()
method panics and the only thing that is called later is destructor, which does check. I already expleined this in another comment.- This is PoC. If you'd like to improve it send a PR. If you want to use it, I can publish it.
I added most of the existing Future
combinators to @tailhook's code to get a feel for it, and the results were pretty expected:
-
Handling the
NotReady
case when polling a child future and returningNotReady
yourself is more verbose, because the originalSelf
type must be reconstructed from the returned child future. Eg:Chain
and almost every use ofasync_not_ready!
in that file. -
There are no more
Option
s and no cases ofpanic!("future polled twice")
, again as expected. This makes the code a bit simpler. Eg: thefutures
version ofSelect::poll
has to use a boolean instead of returning early because of the borrows, whereas thevaluable-futures
version can return early. Similarly unlike thefutures
version ofJoin*::poll
, thevaluable-futures
version does not need anything special likeerase()
since everything will be automatically dropped anyway. -
Another interesting thing about
Chain::poll
: sinceAsync::NotReady
contains the future,Chain::poll
could usePoll<B>
to receive theSecond
future from its callback. It's a fitting use ofAsync
compared to thefutures
version, whose callback returnsResult<Result>
for this with theOk(Err)
containing theSecond
future.
I've spent some time writing Future
implementations and I think it would be easier to write them with by-value poll.
Calling poll not from another Future
wrapper is much more rare case.
Wild idea: having poll
accept special trait Stealer<Self>
that can be used to get a reference or take an ownership. Stealer<T>
shall be implemented for T
, Option<T>
, Vec<T>
combined with index, OccupiedEntry
and other containers from which T
may be taken by value.
I think something like linear types rust-lang/rfcs#814 would be helpful.
@omni-viral Stealer<T>
sounds much like my idea with Slot
trait.
@Kixunil exactly the same 😄
I skipped your comment somehow
I'm going to go ahead and close this out, given that we've recently completed the 0.2 revamp and it's pretty clear that this change is not in the cards.