slightlyoutofphase/staticvec

There's not actually any practical reason for the `length` field of a StaticVec to be of type `usize`

slightlyoutofphase opened this issue · 7 comments

It's just kinda unavoidable due to nothing else being accepted as an array index in Rust (despite the fact that I'm quite sure it's not even possible to instantiate a static array with a length anywhere close to where you'd need usize indices, because you'd overflow the stack way before then.)

u32 would make significantly more sense IMO (at least on x64) and probably improve performance in places due to the layout optimizations I believe it would allow for. It's unclear what an ergonomic way to do it is, though.

Any suggestions / opinions would be very welcome!

Turns out this probably doesn't matter optimization-wise as much as I thought it did, if at all.

Evrey commented

I propose reöpening this issue, as dependent typing can be used to pick the most fitting type. This would e.g. allow for 255 bytes strings to take up exactly 256 bytes, no padding. There are cases where such compact layouts are desirable. In addition to that, in most ABIs and calling conventions, structs that are up to 2 registers in size get passed via registers. So cutting down the size of the length field benefits tiny containers a lot.

#![feature(const_generics)]

trait T { type I; }
struct X<const N: usize>(std::marker::PhantomData<[u8; N]>);

impl T for X<1> { type I = u8;  }
impl T for X<2> { type I = u16; }

fn main() {
    println!("{}B, {}B",
        std::mem::size_of::<<X::<1> as T>::I>(),
        std::mem::size_of::<<X::<2> as T>::I>(),
    );
}

I'll probably try to revisit making this work ergonomically somehow once const generics functionality gets a little bit more powerful than it currently is.

Figured I might as well go ahead and re-open this as previously suggested just for the sake of making sure I don't forget about it, as it is for sure something I want to address whenever a reasonable way of doing so becomes apparent.

It's definitely one use case / situaton where Rust's lack of something resembling C++ decltype functionality (and lack of support for functions that can generically return types in the compile-time context of actually specifying what type a concrete struct field should be, based on constant input) is a bit of a "papercut" IMO.

For example, if staticvec was a C++ library, I certainly would have written the basic skeleton of it very roughly as follows from day one:

#include <array>
#include <cstddef>
#include <cstdint>
#include <limits>
#include <type_traits>

template <typename T>
constexpr auto max = std::numeric_limits<T>::max;

template <const size_t N>
consteval auto get_type() noexcept {
  if constexpr (N >= 0 && N <= max<uint8_t>()) {
    return static_cast<uint8_t>(N);
  } else if constexpr (N > max<uint8_t>() && N <= max<uint16_t>()) {
    return static_cast<uint16_t>(N);
  } else if constexpr (N > max<uint16_t>() && N <= max<uint32_t>()) {
    return static_cast<uint32_t>(N);
  } else if constexpr (N > max<uint32_t>() && N <= max<uint64_t>()) {
    return static_cast<uint64_t>(N);
  } else {
    // not really possible to get here, but it seemed like the best way to write the `else`.
    return static_cast<uint64_t>(N);
  }
}

template <typename T, const size_t N>
struct StaticVec {
private:
  using IndexT = decltype(get_type<N>());
  using ArrayT = std::array<T, N>;
  // this would be for `filled_with`
  using Initializer = auto (*)() -> T;
  // this would be for `filled_with_by_index`
  using IndexedInitializer = auto (*)(const IndexT) -> T;

  static constexpr IndexT CAPACITY = get_type<N>();

  ArrayT data{};
  IndexT length = 0;
};

@Evrey

It occurred to me that perhaps I might not have fully grasped what you were getting at with your original "dependent typing" snippet. Is there a form of what you gave an example of that doesn't rely on E.G. macro impls for a hardcoded range of numbers, but rather "just works" without breaking anything the crate currently does? I'd be totally fine with something that required me to do casts to usize from whatever type internally, but anything that makes the user-facing aspect of things any different from how it is now is a "deal breaker", I'd say.

Evrey commented

@slightlyoutofphase

It's easy, really. I ask for no user-visible API change, except for the return types of capacity and length functions. That code snippet of mine just demonstrated that it is possible right now to get a different integer type depending on some integer constant. So in summary for users the only change is fn len(&self) -> usize to fn len(&self) -> Self::Len or whatever the names may be. And that type is no wrapper, but just an alias for u8 to u128 or whatever.

However, fiddling around a bit more with it, it turns out that the current const generics are still not powerful enough to make that Self::Len or X<N>::I or whatever easy to implement. In theory you'd just do a match or some next_power_of_two magic or so to turn the capacity of a StaticVec into a byte size of the smallest possible integer type needed to store the length.

Sooo… this whole thing may have to wait for full const generics. Unless we want a macro expansion depending on, say, 4.3 billion numbers.

However, fiddling around a bit more with it, it turns out that the current const generics are still not powerful enough to make that Self::Len or X<N>::I or whatever easy to implement. In theory you'd just do a match or some next_power_of_two magic or so to turn the capacity of a StaticVec into a byte size of the smallest possible integer type needed to store the length.

Sooo… this whole thing may have to wait for full const generics. Unless we want a macro expansion depending on, say, 4.3 billion numbers.

Yeah, this is basically what I also thought was (unfortunately) currently the case. I just wanted to make sure I hadn't missed anything about what you were actually trying to say. Thanks for the input, again.