scylladb/scylla-rust-driver

Make paging state strongly typed

Opened this issue · 12 comments

Currently, paging state is represented both in the driver's internals and in the public API as Bytes blob. As it's no use allowing the users to construct paging state on their own, I suggest creating an opaque newtype for it, with a private constructor.

Additionally, the core benefit that Bytes bring - being able to subslice a bigger allocation - does not apply in the case of paging state. I believe that paging state could be better represented as Arc<[u8]>.

As it's no use allowing the users to construct paging state on their own, I suggest creating an opaque newtype for it, with a private constructor.

They may want to serialize paging state and deserialize it later

As it's no use allowing the users to construct paging state on their own, I suggest creating an opaque newtype for it, with a private constructor.

They may want to serialize paging state and deserialize it later

Can't this be said about any struct that we intentionally restrict access to?

No, this is actually existing use case, that's how you can implement paging e.g. in your web service.
Even Java Driver docs mention this use case: https://java-driver.docs.scylladb.com/scylla-3.11.2.x/manual/paging/index.html
It's hardly comparable to serializing any other random struct.

Then it would make sense to expose constructor (from_raw) and accessor (into_raw/into_inner).

What is the problem with current version (just using Bytes)?
Custom struct will force users to create their own newtype wrapper to be able to serialize it, that's 2 more levels of abstraction for which I don't see a compelling benefit

What is the problem with current version (just using Bytes)?

The answer is:

Additionally, the core benefit that Bytes bring - being able to subslice a bigger allocation - does not apply in the case of paging state. I believe that paging state could be better represented as Arc<[u8]>.

When I see Bytes, I expect some shared ownership of the whole frame. This is misleading, because paging state's Bytes are currently created from an exclusively owned buffer that is distinct from the frame Bytes.

Custom struct will force users to create their own newtype wrapper to be able to serialize it

Not if we provide the two functions I mentioned:

Then it would make sense to expose constructor (from_raw) and accessor (into_raw/into_inner).

Custom struct will force users to create their own newtype wrapper to be able to serialize it

Not if we provide the two functions I mentioned:

Yes, even then, because people use frameworks for serialization (serde, rkyv etc)

Then it would make sense to expose constructor (from_raw) and accessor (into_raw/into_inner).

What is the problem with current version (just using Bytes)?

The answer is:

Additionally, the core benefit that Bytes bring - being able to subslice a bigger allocation - does not apply in the case of paging state. I believe that paging state could be better represented as Arc<[u8]>.

When I see Bytes, I expect some shared ownership of the whole frame. This is misleading, because paging state's Bytes are currently created from an exclusively owned buffer that is distinct from the frame Bytes.

Now I don't understand at all. You don't want Bytes because it implies shared ownership (it doesn't, can be used here without any problem), and you propose Arc<[u8]> which is definitely a shared-ownership. I'm lost.

I expect some shared ownership of the whole frame.

Perhaps I should have made this bold:

I expect some shared ownership of the whole frame.

Bytes enable subslicing while retaining shared ownership; instead, when you have an Arc<[u8]> in hand, you know the exact size of the allocation.

This makes huge difference when you want to have control over memory allocated:

  • when you hold a Bytes that refer to the result frame, they keep it allocated all the time;
  • when you hold a Bytes pointing to the paging state only, you might wonder whether they keep the frame allocated;
  • when you hold an Arc<[u8]> pointing to the paging state only, you have no such worries.

Such precaution (about using Bytes carefully` in deserialization) was included in this the deserialization framework refactor by @piodul.

User of the driver doesn't care about the frame and has no reason to think about it in terms of result frame. This is something only maintainer of this driver would do.

As discussed face to face, deserialization refactor is going to enable the user deserialize frame subslices into Bytes, so the user will have to care about the result frame.
I, as the maintainer, get confused about Bytes being used there.