alexcrichton/tokio-curl

Model the partial results with a stream instead of a write_function

Opened this issue · 3 comments

Perhaps it's more rustic to return a stream of partial results in perform? Then you could collect each result into a vector and hook a further function on it. On the downside, you gain only some ergonomic benefit for the lose of the straightforward call to libcurl.

let mut result = Vec::new();
{   
   let mut transfer = easy.transfer();
   transfer.write_function(|data| {
      result.extend_from_slice(data);
      Ok(data.len())
    }).unwrap();
   transfer.perform().unwrap();
}   
session.perform(easy).map(|_| String::from_utf8(result).unwrap()) 

vs.

session.perform(easy).collect::<Vec<u8>>().map(|x| String::from_utf8(x).unwrap())

The intention of this library is primarily to provide as-low-a-cost-as-possible binding to libcurl, but I'd be totally down for providing a nicer abstraction on top (perhaps not in the easy module) which had more rustic interfaces like this.

Put another way, I'd avoid doing this for the Easy type itself, but having a different layer on top sounds great to me.

I'm not sure I want the original proposed syntax for this, but I certainly agree that a lot of Curl's callback functionality should have nice wrappers when used with Tokio.

I think the functionality of read_function and write_function map exactly to futures::Stream and futures::Sink, respectively.

  • Both functions allow the callback to return Err(Pause) if the stream/sink can't make progress, and then call unpause_read or unpause_write later.
  • Either callback can return Err(ReadError::Abort) or Ok(0), respectively, to abort the request if an error occurs. Propagating the actual error out somewhere so it is available would be nice.

Note that I think write_function should consume a Sink, not produce a Stream, to match push-semantics with the underlying library, and vice versa for read. If you want a Stream, you can hand write an MPSC channel.

The curl crate doesn't automatically clear the callbacks registered on an Easy, because you might want to reuse them. However, that means that none of the resources owned by the callback get cleaned up at the end of the request. If one of those resources is, say, a futures::Sink, and you're depending on it getting dropped to signal that there's no more data coming, then you'll wait forever. (Guess what I spent an hour debugging today? It's easy enough to work around, but it took me a long time to identify the root cause.)

If the user chooses to provide a write: Sink or read: Stream to tokio-curl, then I would like tokio-curl to set corresponding callbacks before starting the request, and to reset them (e.g. req.write_function(|_| Ok(0))) as soon as the request completes, so resources in the callback don't leak.

Does this seem like a sensible plan? How would you design the API for setting up the sink/stream for an Easy?

Seems like a plausible plan to me! I might recommend AsyncRead and AsyncWrite over Stream and Sink just in the sense that they're a little more optimized for shipping bytes everywhere. Otherwise resetting callbacks around each request sounds fine by me!