`IndexMap` sometimes reports a lower capacity after removing elements
Closed this issue · 9 comments
I made a small test script that adds and removes elements from an IndexMap
:
use indexmap::IndexMap;
fn main() {
let mut map = IndexMap::with_capacity(10);
for i in 0..1000 {
map.insert(i, i);
println!("{} / {}", map.len(), map.capacity());
if i % 5 == 0 {
map.swap_remove_index(i / 2);
map.swap_remove_index(i / 3);
map.swap_remove_index(i / 4);
}
}
}
Sometimes, when the length of the map (left) decreases, the capacity (right) decreases as well.
410 / 431
411 / 432
412 / 432
413 / 432
411 / 431
412 / 431
413 / 431
414 / 432
415 / 432
414 / 430
415 / 430
416 / 430
417 / 430
418 / 431
416 / 429
417 / 429
418 / 429
419 / 430
420 / 430
419 / 429
420 / 429
421 / 429
422 / 430
423 / 430
421 / 427
422 / 427
423 / 427
424 / 427
425 / 427
From what I understand, the "capacity" is the total number of elements the map can hold in the chunk of memory it has allocated. If you add an item when the map is full, I would expect the capacity to increase as the map allocates more memory. I would never expect the capacity to decrease.
This behavior occurs with both swap_remove_index
and shift_remove_index
.
This is an artifact of hashbrown::raw::RawTable
, and you can reproduce it with std::collections::HashMap
too, just removing by key instead of index. Capacity is reported as items + growth_left
, but the latter is only sometimes updated on removal. That is, a removed entry may be marked EMPTY
and increase growth_left
, but it sometimes needs a DELETED
tombstone and does not change growth_left
, which is when you see the reported capacity decrease.
IndexMap
has two relevant capacities, the RawTable
and the item Vec
, and we report the lesser of those.
Why does DELETED
not change growth_left
? Should I take this to mean that the capacity of the map actually decreases, i.e. if I create a map with a capacity of 10, later on an allocation might be required to add items even if the new length does not exceed 10?
Why does
DELETED
not changegrowth_left
?
I think it's because those DELETED
entries are considered part of the load factor of the map, affecting how far you might have to search for entries from their ideal hash location. Hashbrown doesn't shift other items back when removing.
Should I take this to mean that the capacity of the map actually decreases, i.e. if I create a map with a capacity of 10, later on an allocation might be required to add items even if the new length does not exceed 10?
Yes, that's true. If a new insertion lands on a DELETED
entry, it can reclaim that without resizing, but otherwise it has to reconcile the load factor. One further trick is that when there are a lot of DELETED
entries, it may just rehash everything in-place instead of actually reallocating. IndexMap
already stores the computed hashes, so rehashing the raw table is relatively cheap.
We could potentially add some heuristics to indexmap
to scrub the table more aggressively, and/or add some API to let the user trigger that themselves.
OK, so I guess that means I shouldn't use IndexMap
in a scenario where I don't want to trigger memory allocations after the initial creation of the map?
Currently, the only guarantee about avoiding allocation is if you dynamically check the capacity()
.
With hashbrown 0.11 (#181) we could add a method that internally uses RawTable::try_insert_no_grow
, and on failure we could rehash to scrub DELETED
entries and try again. Maybe we could also figure that out now using the reported capacity()
and what we know of the expected load factor of the raw buckets()
.
In the meantime, maybe it would be good to add something to the documentation about this? I'm sure I'm not the only one confused about why the capacity can decrease over time.
Adding docs seems like the sensible action to resolve this issue, for the rest we refer to hashbrown.
Hashbrown's capacity
says this:
This number is a lower bound; the
HashMap<K, V>
might be able to hold more, but is guaranteed to be able to hold at least this many.
We could say something like that, and perhaps also note that removal may cause that to decrease.
@tesselode if you're interested, maybe you could comment on rust-lang/hashbrown#255 why you're concerned about this, and whether you think such a manual vacuum
operation would work for you.