indexmap-rs/indexmap

`IndexMap` sometimes reports a lower capacity after removing elements

Closed this issue · 9 comments

I made a small test script that adds and removes elements from an IndexMap:

use indexmap::IndexMap;

fn main() {
	let mut map = IndexMap::with_capacity(10);
	for i in 0..1000 {
		map.insert(i, i);
		println!("{} / {}", map.len(), map.capacity());
		if i % 5 == 0 {
			map.swap_remove_index(i / 2);
			map.swap_remove_index(i / 3);
			map.swap_remove_index(i / 4);
		}
	}
}

Sometimes, when the length of the map (left) decreases, the capacity (right) decreases as well.

410 / 431
411 / 432
412 / 432
413 / 432
411 / 431
412 / 431
413 / 431
414 / 432
415 / 432
414 / 430
415 / 430
416 / 430
417 / 430
418 / 431
416 / 429
417 / 429
418 / 429
419 / 430
420 / 430
419 / 429
420 / 429
421 / 429
422 / 430
423 / 430
421 / 427
422 / 427
423 / 427
424 / 427
425 / 427

From what I understand, the "capacity" is the total number of elements the map can hold in the chunk of memory it has allocated. If you add an item when the map is full, I would expect the capacity to increase as the map allocates more memory. I would never expect the capacity to decrease.

This behavior occurs with both swap_remove_index and shift_remove_index.

This is an artifact of hashbrown::raw::RawTable, and you can reproduce it with std::collections::HashMap too, just removing by key instead of index. Capacity is reported as items + growth_left, but the latter is only sometimes updated on removal. That is, a removed entry may be marked EMPTY and increase growth_left, but it sometimes needs a DELETED tombstone and does not change growth_left, which is when you see the reported capacity decrease.

IndexMap has two relevant capacities, the RawTable and the item Vec, and we report the lesser of those.

Why does DELETED not change growth_left? Should I take this to mean that the capacity of the map actually decreases, i.e. if I create a map with a capacity of 10, later on an allocation might be required to add items even if the new length does not exceed 10?

Why does DELETED not change growth_left?

I think it's because those DELETED entries are considered part of the load factor of the map, affecting how far you might have to search for entries from their ideal hash location. Hashbrown doesn't shift other items back when removing.

Should I take this to mean that the capacity of the map actually decreases, i.e. if I create a map with a capacity of 10, later on an allocation might be required to add items even if the new length does not exceed 10?

Yes, that's true. If a new insertion lands on a DELETED entry, it can reclaim that without resizing, but otherwise it has to reconcile the load factor. One further trick is that when there are a lot of DELETED entries, it may just rehash everything in-place instead of actually reallocating. IndexMap already stores the computed hashes, so rehashing the raw table is relatively cheap.

We could potentially add some heuristics to indexmap to scrub the table more aggressively, and/or add some API to let the user trigger that themselves.

OK, so I guess that means I shouldn't use IndexMap in a scenario where I don't want to trigger memory allocations after the initial creation of the map?

Currently, the only guarantee about avoiding allocation is if you dynamically check the capacity().

With hashbrown 0.11 (#181) we could add a method that internally uses RawTable::try_insert_no_grow, and on failure we could rehash to scrub DELETED entries and try again. Maybe we could also figure that out now using the reported capacity() and what we know of the expected load factor of the raw buckets().

In the meantime, maybe it would be good to add something to the documentation about this? I'm sure I'm not the only one confused about why the capacity can decrease over time.

bluss commented

Adding docs seems like the sensible action to resolve this issue, for the rest we refer to hashbrown.

Hashbrown's capacity says this:

This number is a lower bound; the HashMap<K, V> might be able to hold more, but is guaranteed to be able to hold at least this many.

We could say something like that, and perhaps also note that removal may cause that to decrease.

@tesselode if you're interested, maybe you could comment on rust-lang/hashbrown#255 why you're concerned about this, and whether you think such a manual vacuum operation would work for you.