CAD97/pointer-utils

Doc review!

Opened this issue ยท 16 comments

CAD97 commented

I try to write great documentation, but I need others' eyes to make sure it's good. Check out the master documentation and put any feedback in this thread. (Actionable issues may of course still get their own issues.)

Neat crate! I like it!! ๐Ÿ™‚

Regarding https://cad97.github.io/pointer-utils/erasable/trait.ErasablePtr.html#safety, I think adding a counter example there would be helpful. It's kind of the same issue that the pin docs had for some time: it is hard to visualize the invariants without counterexamples for each missing invariant ๐Ÿ˜…

CAD97 commented

The requirements of ErasablePtr are hard to create counterexamples for, because they primarily are protecting against degenerate cases ๐Ÿ˜… I'm actually failing to come up with a "useful" example that would actually be able to roundtrip through an erased pointer and fail to uphold the requirements, because failing to do so is such a degenerate case.

Two simple degenerate cases I can actually come up with are shown in #26.

I'd really love to see a detailed rationale for slice-dst's soundness. The existing documentation handwaves it, and the implementation appears to have hardcoded assumptions about the layout rustc uses for fat pointers. I didn't see any comments or documentation justifying these assumptions, and as far as I can tell the layout in question is explicitly not guaranteed to be stable.

CAD97 commented

implementation appears to have hardcoded assumptions about the layout rustc uses for fat pointers

This isn't quite right.

I make no assumption of how the pointer itself is laid out. ptr::slice_from_raw_parts_mut is a standard library way to construct a fat pointer from a thin pointer and the slice length (fn<T>(*mut T, usize) -> *mut [T]).

After that, it's just the heap structure that we have to make line up. Box is guaranteed to use the global allocator, and for manual alloc/dealloc using Rust's global allocator to be allowed. Array layout and slice layout are specified to be the obvious (as provided by Layout::array). #[repr(C)] is specified layout, and now we've found specification for the entire heap layout of the type we're allocating.

If you've got a suggestion for where to elaborate on these details, I'd be happy to add some notes. It might help to note that everything in layout_polyfill.rs is a polyfill for feature(alloc_layout_manipulation), which stabilized in Rust 1.44 (and I'm doing a patch to use it on Rust ^1.44).

If you've got a suggestion for where to elaborate on these details, I'd be happy to add some notes.

The code that makes me particularly anxious is in StrWithHeader::new and the analogous, but more complex, case for slices. Reviewing more carefully, it looks like this is populating the target of the fat pointer rather than the fat pointer itself--but then why is there a length field, since that's already represented in the metadata?

More broadly, it would be nice to have an discussion of the strategy in the module docs, in considerably more detail than "we can manually allocate the Box" and "rust handles it".

#[repr(C)] is specified layout

I can't find any use of #[repr(C)] in slice-dst whatsoever.

CAD97 commented

#[repr(C)]

SliceWithHeader and StrWithHeader are #[repr(C)]; this is what allows us to know their layout and write them into a pointee-uninitialized pointer.

but then why is there a length field, since that's already represented in the metadata?

Ah, yes, that might be confusing. The length is redundantly stored inline as well such that the types are Erasable -- pointers to them can be roundtripped through thin untyped pointers. If I could rewind time, I would try harder to have this functionality be another layer around [Str|Slice]WithHeader, probably, but I've yet to find a truly satisfactory design (and it would require a breaking version bump at this point anyway).

The fact that these types always carry the slice length inline is one at least partially of legacy; the predecessor crate of thin-dst was primarily about supporting thin pointers to slice-based DSTs, and triomphe, the real prior art, also did it this way.

SliceWithHeader and StrWithHeader are #[repr(C)]; this is what allows us to know their layout and write them into a pointee-uninitialized pointer.

Whoops, indeed they are. Somehow I skimmed that on manual review, and I guess rg parsed it as regex syntax...

The length is redundantly stored inline as well such that the types are Erasable -- pointers to them can be roundtripped through thin untyped pointers.

Would it make sense to feature-guard them, at least? That would both reduce overhead and make their function clearer.

CAD97 commented

Would it make sense to feature-guard them, at least?

The problem with just doing that is that then all members of the structs are pub. This makes re-adding the not-pub field a breaking change.

I've got the WithHeader<Header, str>/WithHeader<Header, [T]>/WithHeader<WithLength<Header>, [T]>> design stuck in my head again, I'll probably spend some time seeing if I can make that work and deprecate the old names.

The problem with just doing that is that then all members of the structs are pub

#[non_exhaustive]? A private () field?

I'll probably spend some time seeing if I can make that work and deprecate the old names.

That works too; good luck!

CAD97 commented

#63 adds some big-picture documentation to slice-dst.

I think you mean #64 ;)

Thoughts after reading the slice_dst page: https://docs.rs/slice-dst/1.5.1/slice_dst/

I was very confused by the fact that the initializations of the Nodes in the second example don't have Arc::news. I missed the definition of Node and just noticed that they were not arcs and the diagram didn't make sense in light of that. (and on reflection, it wouldn't be possible to have a slice of them if they weren't Arc'd, as you really couldn't have an unsized type directly in a slice)

The example seems a bit odd, why would you want this functionality for a tree datatype? Trees are generally quite happy with indirected child lists.

I don't know what an obvious and natural example would look like, but I think it would narrow in on the amazing thing going on here if we used my application as the example:

I'm implementing CSR++, an updatable graph datastructure. It has a series of segments for its vertices. Each segment has a lock at the start and then a long series of vertices (length not known at compile-time). The segment pointers should be as well packed in the vertex array as possible, meaning that we want the length of the vertex array to be on the other side of the pointer, we don't want the pointer to be fat like rust pointers generally want to be, and then we want the lock on the segment to be with the vertex array, we want the vertices to be right there, not on the other side of yet another jump.

An aside, can slice-dst do that? Can it avoid having a fat pointer even though it's pointing to an unsized type? Does rust allow any way to point to an unsized type with a non-fat pointer?

No, it canโ€™t.

Though there is ThinArc in another crate triomphe, which allows you to store unsized data using one single pointer.

I recently submit a PR to triomphe to support another crate arc-swap, which can swap Arc automatically.

The PR is merged, but the repo owner hasnโ€™t released a new version yet.

CAD97 commented

Actually, yes, this is a design point of slice-dst, or, rather, of erasable (which is also in this repo).

(The reason for the tree example is that these crates were originally written while hacking on Rowan, the syntax tree impl for rust-analyzer, with an eye specifically on minimizing allocations, but keeping tree-node-ownership (i.e. no vec+index handles), with the result being thin pointers to nodes which keep their children inline. Rowan also has the specific advantage of being (mostly) an immutable backing store.)

Specifically, using [erasable::Thin](https://docs.rs/erasable/1.2.1- /erasable/struct.Thin.html), you can wrap any pointer that implements erasable::ErasablePtr and points to a erasable::Erasable payload. Or, in plainer words, a smart pointer which you can into_raw into effectively void*C and recover back the fat pointer knowing only the (thin) pointer and type.

If you use SliceWithHeader, it contains the length inline, so it implements Erasable. As such, you could have struct Node(Thin<Arc<SliceWithHeader<SizedData, Node>>>); and Node would be a thin pointer to the length, header data, and (thin) child pointers in one allocation.

While the pointer-utils crates are more composable, triomphe is imo still more user-friendly, because slice-dst is very hard to use effectively. (Also, it requires first creating a Box of your type and then copying it to an Arc, since we use the std types and can't allocate an Arc directly, whereas since triomphe wrote their Arc they can skip the copy.) The main advantage of slice-dst and erasable is that you're using the standard containers.

Once feature(ptr_metadata) is stable, I believe I can make this process much easier. In the time after I originally released slice-dst, I've reconsidered the API surface multiple times, and aren't a big fan of it anymore. I'm actively experimenting with ptr_metadata and plan to eventually release indyn as a new approach to the custom DST type support.

With indyn down the road (the biggest added capability being support for non-slice DST tails, such as dyn Trait!), I don't have much motivation for improving slice-dst, due to its clunky (but functional!) API.

CAD97 commented

Side note on triomphe: there's an implicit reliance on storing &Header (thin) and using it to access &(Header + Slice) (fat). It's still unclear whether this is allowed by Rust's memory/borrowing model, and is currently disallowed by Stacked Borrows (miri) when tracking raw pointer validity more accurately (an additional flag on miri, due to the extra runtime cost). erasable (and slice-dst with erasable) suffer no such problem.

This is not unfixable, but will take a somewhat significant rewriting of triomphe's internals to fix (by using more thorough type erasure and/or more raw pointers) if deemed necessary. The general temperature of the UWG is that we want to support this (somewhat common) pattern, but it's not clear how to do so while retaining the "obviously correct" optimizations that Rust's strong aliasing xor mutability is supposed to allow for.

(If you know anything about how SB works: the problem is in reborrowing/retagging at function boundaries, which means references can only be used to access the amount of memory the type says it wants to, prohibiting the thin-to-fat expansion for references. erasable sidesteps this by always using type erased raw pointers for thin pointers.)

One other side note: erasable is suboptimal in that you have to use Thin<&Dst> to get a thin pointer, and &Dst is still a fat pointer. If we ever get extern type (avoiding the SB problem), I plan one more pointer-utils crate to allow library usage of extern type to provide "native"(ish) thin types to built-in fat DST tails, where &Dst is thin.

further documentation feedback:

The use of Thin<Box<_>> in the example here https://docs.rs/erasable/1.2.1/erasable/struct.Thin.html might be making it unclear as to whether or how the type is being erased from the caller-facing API? My prior is that it isn't being erased there, but the API is called "erasable" so it introduces confusion. If it is erased, that should be made clearer too.