coder/hnsw

memory overhead estimate?

Closed this issue · 9 comments

Hi! It'd be really useful if the readme included a rough estimate of the memory overhead, or other rough formula for expected steady state memory usage, in terms of the number of vectors and their shape. Thanks. :)

Let me know if this helps.

Yes, thank you!

Actually, I have a follow-up. :)

One of the factors is listed as the size of the IDs. But IDs are strings, which means that there's also a 2 word (16 bytes on arm64/amd64) overhead per string, which will typically dwarf the actual string contents. Given that, I wonder whether it'd be better to have IDs be (u)int64s instead. It'd provide lower, constant overhead, and also make map operations faster.

Good point on the additional words.

I went with strings to provide broad compatibility with existing IDs. I don't want most users maintaining some mapping between strings and uint64s to use the graph, and, I find large stringy IDs such as UUIDs are more popular these days than numerics.

Another idea is accepting a second generic parameters, K comparable. I don't like how it complicates graph instantiation and gives the user yet another parameter to think about, but perhaps the flexibility is worth it.

I don't like how it complicates graph instantiation

Seconded.

But OTOH this is exactly the sort of scenario in which generics shine.

How important is this level of performance to your use case?

I’m skirting the edge of what I can do now, at 500gb of RAM. So for me, it’d be pretty useful; I could use uint32s and scale up. But I don’t know whether I am a typical user. :)

Very neat that you're using it at that scale! I have yet to test with graphs that large. Let me know what you think about the API here.

new api looks great. closing this again. :)