Question: Using with complex types like HashMap
nyurik opened this issue · 5 comments
I would like to build this type of a struct, just like I outlined in stackoverflow question. It seems Ouroboros is the closest to this, but I am still not certain if it's possible:
#[self_referencing]
struct StringIndex {
store: Vec<String>,
#[borrows(store)]
#[covariant]
index: HashMap<&'this str, usize>,
}
and implement some magical method to perform "lookup or push":
impl StringIndex {
pub fn get_or_insert(&mut self, key: &str) -> usize {
match self.index.entry(key) {
Occupied(v) => v.key(),
Vacant(v) => {
let idx = self.store.len();
// Create a new clone, and a borrowed ref to it
let new_value = key.to_string();
let borrowed_value = &key; // This wouldn't work in Rust
self.store.push(new_value);
self.index.insert(borrowed_value, idx);
idx
}
}
}
}
The thing that ouroboros
allows you to do is to keep a borrow that points to "this same object", but it's still subject to the rule that mutable access is exclusive access. Since store
is borrowed perpetually, you can never get an &mut
to it; doing so would be unsound (more than not possible without help — not something a valid program may ever do).
You can use an interior-mutable type as store
, though. In particular, typed_arena::Arena
might suffice; this compiles:
use std::collections::HashMap;
use typed_arena::Arena;
#[ouroboros::self_referencing]
struct StringIndex {
store: Arena<String>,
#[borrows(store)]
#[covariant]
index: HashMap<&'this str, usize>,
}
impl StringIndex {
pub fn get_or_insert(&mut self, key: &str) -> usize {
self.with_mut(|fields| -> usize {
if let Some(&value) = fields.index.get(key) {
value
} else {
let idx = fields.store.len();
let stored_str: &str = fields.store.alloc(key.to_owned());
fields.index.insert(stored_str, idx);
idx
}
})
}
}
However, given what you've written I imagine you also want to be able to get from usize
to &str
. typed_arena::Arena
doesn't do that because it intends to offer mutability of the elements; what you want is an append-only list (which probably exists, but I don't know of one to recommend). You could do it by adding a second vector, but perhaps there's a better way than this:
use std::collections::HashMap;
use typed_arena::Arena;
#[ouroboros::self_referencing]
struct StringIndex {
store: Arena<String>,
#[borrows(store)]
#[covariant]
index: HashMap<&'this str, usize>,
#[borrows(store)]
#[covariant]
rev_index: Vec<&'this str>,
}
impl StringIndex {
pub fn get_or_insert(&mut self, key: &str) -> usize {
self.with_mut(|fields| -> usize {
if let Some(&value) = fields.index.get(key) {
value
} else {
let idx = fields.store.len();
let stored_str: &str = fields.store.alloc(key.to_owned());
fields.rev_index.push(stored_str);
fields.index.insert(stored_str, idx);
idx
}
})
}
pub fn lookup_idx(&self, idx: usize) -> Option<&str> {
self.borrow_rev_index().get(idx).copied()
}
}
@kpreid thank you for such an awesome write up and explanation!!! TIL :) You are absolutely correct -- my use case is an ever-growing (accumulator style) vector -- the user needs to build a vector of values (using Vec::push()
only, no modifications/deletions) and use the index of those values. Once done inserting, the entire vector will be consumed. So the ideal interface to this accumulator/interner is:
// Note that this interface should support `&'static str`, `String`, or any other `T`s,
// but not a mix of them in the same container.
fn append<T>(&mut self, value: T) -> usize;
fn into(self) -> Vec<T>;
From the sounds of it, I may have to actually implement this as a separate crate with unsafe functionality - if the Vec is only growing, the non-exposed references to the vector's values shouldn't violate any Rust memory rules. So internally it would be a Vec<T>
and HashMap<&T, usize>
, and as long as I don't allow any kind of vector access or modification beyond push
, this should be sound.
P.S. Possible optimization of the above is if the T
is a &'static V
-- in this case it might make sense for the internal HashMap
to store T
itself, and not a ref to it (avoid double redirect): HashMap<T, usize>
the user needs to build a vector of values (using Vec::push() only, no modifications/deletions) and use the index of those values. Once done inserting, the entire vector will be consumed.
In this case, typed_arena
is all you need, because it has Arena::into_vec()
, and in order to get the Arena
out, you can use ouroboros's generated into_heads()
method. Just add this method to my first code sample (not tested):
pub fn into_vec(self) -> Vec<String> {
self.into_heads().store.into_vec()
}
if the
Vec
is only growing, the non-exposed references to the vector's values shouldn't violate any Rust memory rules.
No, that's not reliably true: when the Vec
grows, it allocates new memory and moves the existing data into it, invalidating references.
thanks, when i tried, i realized that the ref was into Vec, not into the memory allocated for the string itself. I came up with this (without Arena): I simply duplicate the String
object as is, and then implement the Drop trait to ensure that I don't do the double-free. Note that I couldn't figure out which version of the insert_or_get
to call, so I had to be explicit.
mod stores {
use std::collections::HashMap;
use std::mem;
#[derive(Default)]
struct LeakyKeyHashMap<K, V>(pub HashMap<K, V>);
impl<K, V> Drop for LeakyKeyHashMap<K, V> {
fn drop(&mut self) {
for k in mem::take(&mut self.0).into_keys() {
mem::forget(k);
}
}
}
#[derive(Default)]
pub struct Store<T> {
keys: Vec<T>,
index: LeakyKeyHashMap<T, usize>,
}
impl<T> Store<T> {
pub fn into_vec(self) -> Vec<T> {
self.keys
}
}
impl Store<String> {
pub fn insert_or_get(&mut self, key: String) -> usize {
if let Some(index) = self.index.0.get(&key) {
*index
} else {
let index = self.keys.len();
let (ptr, len, cap) = (key.as_ptr(), key.len(), key.capacity());
self.keys.push(key);
let key_dup = unsafe { String::from_raw_parts(ptr as *mut u8, len, cap) };
self.index.0.insert(key_dup, index);
index
}
}
}
impl Store<&'static str> {
pub fn insert_or_get(&mut self, key: &'static str) -> usize {
if let Some(index) = self.index.0.get(&key) {
*index
} else {
let index = self.keys.len();
self.keys.push(key);
self.index.0.insert(key, index);
index
}
}
}
}
fn main() {
use stores::Store;
let mut store = Store::default();
let foo_idx = Store::<String>::insert_or_get(&mut store, "foo".to_string());
Store::<String>::insert_or_get(&mut store, "bar".to_string());
let foo_idx2 = Store::<String>::insert_or_get(&mut store, "foo".to_string());
assert_eq!(foo_idx, foo_idx2);
let keys = store.into_vec();
assert_eq!(keys, vec!["foo".to_string(), "bar".to_string()]);
}
I made a new crate with the above idea, hope it is "sound" :) https://github.com/nyurik/dup-indexer