zero-copy and lazy rows deserialization

Question

zero-copy and lazy rows deserialization

wyfo opened this issue 2 years ago · 8 comments

Currently, QueryResult deserialization deserialize all rows eagerly, and does a lot of allocations:

a vector for all rows;
for each row, a vector for all columns;
for each blob/string column, a copy of the bytes in a new allocated vector/string;

The last point, which can be quite important in workflows with a lot of strings/heavy blobs, is pretty easy to address. Indeed, the crate already use bytes::Bytes, so it query result's bytes could be kept along the deserialization, and blob column be deserialized as Bytes, while string column could use string::String<Bytes>. I ignore the reason why raw &[u8] are used, maybe it's because of the API of byteorder, but bytes crate also provides an endian-aware API, so byteorder could be dropped in favor of bytes.

Allocating a vector for all rows is also a relative overhead regarding queries returning only one row. Instead of deserialized rows, raw Bytes could be stored into QueryResult. It could then have method returning an iterator of rows deserialized at each iteration. It would still be possible to obtain a vector of rows just by collecting the iterator.

Columns deserialization could also avoid using a vector, using a trait system similar to FromRow to deserialize rows into tuples. By the way, compatibility of the tuple could be checked only once before iterating the rows (row deserialization would still return a Result because it must still check there is enough bytes). Old API could still be accessible by making Vec<Option<CqlValue>> implement the row deserialization trait; actually, there could also be an Iterator<Item=Option<CqlValue>> implementing the trait.

To illustrate these points, I've implemented a quick and dirty POC in a branch in order to run some quick benchmarks; they show a (very) significant improvement in terms of memory consumption and performance. I can open a draft PR to make the POC simpler to visualize.

Of course, some of these changes would be breaking:

part of deserialization API is public, e.g. Response, but it has to be modified to use Bytes instead of &[u8];
QueryResult.rows is public, but it would have to be replaced, for example by bytes: Bytes, row_count: usize;
CqlValue::Blob/CqlValue::Text/CqlValue::Ascii should use Bytes/String<Bytes>; actually, this change should not be required, as raw CqlValue may not be used so much anymore.

On the other hand, the typed API rows_typed/single_row_typed/etc. (and even rows()) could stay relatively untouched, as FromRow could implement the deserialization trait too , and breaking changes above seems quite minor to me.

P.S. There are also some strings in ColumnSpec and ColumnType which could also be modified to use string::String<Bytes>; it would save the few remaining allocations, while being a minor breaking change.

wyfo commented 2 years ago

@ultrabug

Answer 1 · 2022-10-05T08:58:59.000Z

I believe we already have a very similar issue: #462 .

I agree that the eager allocations are a problem. Both your approach and the one described in #462 address the issue by pointing to the memory of the original, unserialized frame. However, the main difference is with respect to lifetimes: you suggest to use reference counting, and I suggest using explicit lifetimes. I think there is value in both - reference counting is easier to use, but explicit lifetimes gets rid of the (AFAIK atomic) reference counting and makes it possible to use standard library types such as &str and &[u8] instead of string::String<Bytes> and Bytes.

We could unify both approaches if we used the interface from #462. Instead of deserialize taking data: &'a [u8], we could pass &'a Bytes so that it is both possible to borrow the unserialized frame as well as refer to it via shared pointer semantics.

There is one potential argument against reference counting that I can see. Let's say that you fetch a large page of results, but you only decide to keep one string::String<Bytes> from it. The Bytes need to refer to the unserialized response frame and will keep it alive, leading to a space leak. I'm not sure how frequently this problem could happen in practice, but it definitely makes memory usage harder to reason about. This problem does not happen with explicit lifetimes, as the user of the types that borrow from the frame needs to manually keep the frame alive or convert the types to their owned variants.

WDYT @wyfo @cvybhu?

P.S: I have a work-in-progress branch for #462, I worked on it from time to time but it's quite untidy and didn't even manage to make it compile yet: https://github.com/piodul/scylla-rust-driver/tree/462-more-efficient-deserialization

Answer 2 · 2022-10-06T07:58:51.000Z

I don't know how I missed #462 ... -_-'

We could unify both approaches if we used the interface from #462. Instead of deserialize taking data: &'a [u8], we could pass &'a Bytes so that it is both possible to borrow the unserialized frame as well as refer to it via shared pointer semantics.

Agree 100% with it! It's indeed way better this way. I went with Bytes because of my use case, spawning tasks with 'static lifetime, but I've a lot of smaller queries where &'a str would be more suited. However, I think we will need to pass both &mut &'a [u8] and &'a Bytes, the first one being used to deserialize, and the second one to use Bytes::slice_ref when needed.

There is one potential argument against reference counting [...] The Bytes need to refer to the unserialized response frame and will keep it alive, leading to a space leak.

I agree too, but as stated above, giving the choice to the user solves this issue IMO.

P.S: I have a work-in-progress branch for #462, I worked on it from time to time but it's quite untidy and didn't even manage to make it compile yet: https://github.com/piodul/scylla-rust-driver/tree/462-more-efficient-deserialization

Actually, our works are pretty similar ... too much similar in fact, my inspiration could become questionable 😅 at least, it seems to be the right direction.

Before continuing, we need to answer some questions:

should we modify CqlValue to either use &[u8] or Bytes? I think we could try to make it generic: CqlValue<Blob=Vec<u8>> with the ability to choose between Vec<u8>, Bytes and &[u8] (I don't know if this is feasible, but why not try).
should we try to keep the current API rows_typed/single_row/etc. and maybe deprecate it?
a little parenthesis about error type: ParseError could be cleaned a little bit, as ParseError::CqlTypeError seems to no more be used, and I think the only io::Error of ParseError::IoError should be UnexpectedEOF raised by byteorder, but it should then be ParseError::BadIncomingData.
what about strings contained by metadata? should we use string::String<Bytes>, or hack it storing only offsets on the frame bytes, retrieving the strings with a method? so, no more shallow clone but a less intuitive API. As it seems to me that metadata are rarely used directly (it's mainly serve to deserialization, and only the ColumnType, not the strings), I think it will be ok to optimize it this way.
Should we close this issue as duplicate? or keep this one because discussion seems to happened here? or keep both?

Maybe I can open a draft PR to start more precise discussion about implementation/naming/etc.

Answer 3 · 2022-10-07T06:57:58.000Z

what about strings contained by metadata? should we use string::String, or hack it storing only offsets on the frame bytes, retrieving the strings with a method? so, no more shallow clone but a less intuitive API. As it seems to me that metadata are rarely used directly (it's mainly serve to deserialization, and only the ColumnType, not the strings), I think it will be ok to optimize it this way.

Actually, I've just realized that it's not really necessary to deserialize metadata. In fact, there could be a column type parsing iterator, the same way as there will be a row parsing iterator. Strings would just be ignored.
Of course, fully deserialized metadata could still be available using a dedicated method.

Answer 4 · 2022-10-07T10:49:40.000Z

Agree 100% with it! It's indeed way better this way. I went with Bytes because of my use case, spawning tasks with 'static lifetime, but I've a lot of smaller queries where &'a str would be more suited. However, I think we will need to pass both &mut &'a [u8] and &'a Bytes, the first one being used to deserialize, and the second one to use Bytes::slice_ref when needed.

Using &'a mut Bytes should be sufficient (forgot about the mut before). Bytes implement Deref<Target=[u8]> so you can borrow directly from it, no need to use Bytes::slice_ref.

Before continuing, we need to answer some questions:

* should we modify `CqlValue` to either use `&[u8]` or `Bytes`? I think we could try to make it generic: `CqlValue<Blob=Vec<u8>>` with the ability to choose between `Vec<u8>`, `Bytes` and `&[u8]` (I don't know if this is feasible, but why not try).

I can see how CqlValue could be changed to a generic, however I can see some challenges:

There are some variants that allocate and I'm not sure there are ref-counted or borrowed equivalents for them, notably BigInt and BigDecimal,
The CqlValue itself needs to allocate in case of UDTs, collections and tuples. How do we handle that? Should the borrowed and ref-counted variants use iterators that lazily deserialize the compound type?

I'm OK with postponing solving those issues for later, as the API suggested in #462 would allow deserializing query results directly to the types requested by the users. The CqlValue would still be a type which owns its data.

* should we try to keep the current API `rows_typed`/`single_row`/etc. and maybe deprecate it?

Why would you want to deprecate/remove them?

* a little parenthesis about error type: `ParseError` could be cleaned a little bit, as `ParseError::CqlTypeError` seems to no more be used, and I think the only `io::Error` of `ParseError::IoError` should be `UnexpectedEOF` raised by `byteorder`, but it should then be `ParseError::BadIncomingData`.

I agree, the error hierarchy needs some rethinking and simplification...

* Should we close this issue as duplicate? or keep this one because discussion seems to happened here? or keep both?

Let's keep both issues open and close them later in one go. Both of them contain valuable information IMO.

Maybe I can open a draft PR to start more precise discussion about implementation/naming/etc.

Sure, sounds like a good idea. We can continue the discussion on the PR.

Answer 5 · 2022-10-07T12:38:59.000Z

Using &'a mut Bytes should be sufficient (forgot about the mut before). Bytes implement Deref<Target=[u8]> so you can borrow directly from it, no need to use Bytes::slice_ref.

In fact, you can't borrow a slice and calling Bytes::advance on the same Bytes. So you need to have an immutable Bytes buffer, to borrow immutable slice (or shallow clone a new Bytes), and a mutable slice/offset to keep parsing advancement; slice and offset are roughly the same information, but slice is a fat pointer (2 usize) vs 1 usize for the offset, so maybe the second could be a little bit more efficient to pass around. I need to bench that.

I'm OK with postponing solving those issues for later [...] The CqlValue would still be a type which owns its data.

I'm fine with that.

Why would you want to deprecate/remove them?

The error type will change, as it will have to include parsing error, but that's a minor concern. However, they also take QueryResult by value; this is not an issue for untyped methods, as CqlValue owns its data, but it prevents typed methods to deserialize &[u8]/&str, defeating one of the purpose of the refactoring. Maybe we can cheat and modify the interface to pass self by reference instead, so it should not break most of the code.

There is also a discussion about FromRow, because as written earlier, FromRow could implement the row deserializer trait, but that would imply having a temporary deserialization to Vec<Option<CqlValue>>, with a big overhead. That's why I think the trait should be deprecated.

Answer 6 · 2022-10-07T15:00:57.000Z

Using &'a mut Bytes should be sufficient (forgot about the mut before). Bytes implement Deref<Target=[u8]> so you can borrow directly from it, no need to use Bytes::slice_ref.

In fact, you can't borrow a slice and calling Bytes::advance on the same Bytes. So you need to have an immutable Bytes buffer, to borrow immutable slice (or shallow clone a new Bytes), and a mutable slice/offset to keep parsing advancement; slice and offset are roughly the same information, but slice is a fat pointer (2 usize) vs 1 usize for the offset, so maybe the second could be a little bit more efficient to pass around. I need to bench that.

OK, I see the problem now... I'm not sure what would be the best way to deal with that. It sounds like we would like to have something that behaves as a slice, but allows to take ownership of it via Bytes::slice_ref. Maybe something like this exists already?

Why would you want to deprecate/remove them?

The error type will change, as it will have to include parsing error, but that's a minor concern. However, they also take QueryResult by value; this is not an issue for untyped methods, as CqlValue owns its data, but it prevents typed methods to deserialize &[u8]/&str, defeating one of the purpose of the refactoring. Maybe we can cheat and modify the interface to pass self by reference instead, so it should not break most of the code.

I remember that I got this problem while working on my work-in-progress implementation and I had to introduce an as_typed method so that borrowing is possible. The into_typed method should still be usable with the types that own the data. AFAIK if we restrict the types that the function could return only to types with lifetime 'static then it should work.

There is also a discussion about FromRow, because as written earlier, FromRow could implement the row deserializer trait, but that would imply having a temporary deserialization to Vec<Option<CqlValue>>, with a big overhead. That's why I think the trait should be deprecated.

I agree about FromRow, it will probably become obsolete.

Answer 7 · 2023-03-31T09:46:10.000Z

Note: this should be fixed by #665 when it is merged. It follows some ideas that were discussed here.