tobgu/peds

vector improvement ideas + perf numbers to README?

fingon opened this issue · 7 comments

First of all, great library. I think it is the first reasonable looking Golang data structure library that I have seen, as most of the stuff that drowns in locks or is functional is either really inefficient or awkward to use with multiple goroutines at once.

So, on to questions/requests:

  • Insert/Delete would be also nice to have in vector/vectorslice

  • why vector slice does not have ToNativeSlice? (it would be more efficient to take vector slice and then ToNativeSlice only parts you are interested about, as opposed to the whole vector)

  • any chance of perf benchmarks? the code looks sensible, but I am wondering how much of a hit I will get if I actually use this as opposed to dealing with locking costs + mutable datastructures

(I can guess that O(log32(n)) is heavily involved in some things that are O(1), but I wonder also about constants as well due to memory allocation etc.)

tobgu commented

Hi,

Thanks for your comments and questions, some answers:

  • Insert/Delete equivalent to the functions implemented on python lists you mean? Could perhaps be included, I've mostly targeted feature compatibility with the corresponding data structures in Go for the MVP that PEDS currently is.
  • Should probably be implemented! I have not made a conscious decision to leave it out.
  • I'll try to compile a list of benchmarks to put in the README. I expect the read path to be OK perf wise but the write path is probably a fair bit heavier than for the corresponding mutable types. The Go GC is not really optimized for handling the high number of small, short lived, objects that these type of data structures tend to produce when "modifying" them.

Yep, I was talking about Python equivalent of e.g. del list[idx] or list[4:4] = [value].

Go data structures in general (and slices in particular) are bit too minimalistic for my liking, I have plenty of code that does multiple lines of boring things just to get something that is one line in Python.

MVPness of PEDS is quite cool. I looked at most of the other ds libraries recently and most of them did not do M or V.. :-p (e.g. 'sure, we have 20 data structures but some of them miss delete, and others have O(n) behavior for half operations. good luck,.')

Write path turned out to be prohibitively expensive for what I was doing with the library. At a guess only way this could really work if one used some sort of internal freelist pool, and allowed for slight inefficiency in memory usage. That sounds .. nasty. Sigh.

tobgu commented

OK, that's a shame. Do you have any specific benchmarks that you can share? I haven't had time to produce any yet.

I have given some thought to memory management but not really come up with any viable optimizations:

  • Free lists: Would basically put the burden on the user to do explicit deallocation of the objects so that they can be returned. Since the nodes may be shared some kind of reference counting would also have to be used which would then have to be protected by locks for thread safety. Doesn't feel like the way to go...
  • Allocation of larger blocks of memory (backing arrays) that can then be sliced and used by the nodes. The problem with this approach is that it risks causing memory leaks since the bakcing arrays will stay alive until all slices referencing it have been collected. Since the nodes are shared that wouldn't work either...

I'm happy for more input on this! PR with benchmarks is also welcome! :-)

Hello,

[ ... ] Since the nodes may be shared some kind of reference counting would also have to be used which would then have to be protected by locks for thread safety.

Surely refcounting can just be done with atomic.AddUint32, no? Something like this:

func (v *privatePersonBySsnItemBucketVector) Free() {
    if atomic.AddUint32(&v.ref, ^uint32(0)) == 0 {
        someGlobalPool.Put(v)
    }
}

In the simplest case, someGlobalPool could be implemented by a sync.Pool. If we wanted to get fancy, we could implement it as a global PEDS set and a CAS loop. Thoughts?

Either way +1 for benchmarks.

Thanks for this nice little library!

tobgu commented

Hi!

Sure, a atomic.AddUint32 or similar could probably be used for this! It would still put the burden on the user to free the objects though which is a shame. It may be OK though since it's more of an optimisation than something you would actually have to do.

I like your ideas around the pooling and would be happy taking a PR fleshing it out! (The same goes for benchmarks :-)).

Sure, a atomic.AddUint32 or similar could probably be used for this! It would still put the burden on the user to free the objects though which is a shame. It may be OK though since it's more of an optimisation than something you would actually have to do.

Good point! I always tend to forget that sync.Pool isn't really manual memory management, and you can always offload the responsibility to the garbage collector.

Also, at the risk of being cheeky, PEDS itself is arguably an optimization! If users are reaching for this library, its already because they're trading convenience for performance. It doesn't seem unreasonable to push that tradeoff a bit further, especially if users don't have to use the Free method.

I like your ideas around the pooling and would be happy taking a PR fleshing it out! (The same goes for benchmarks :-)).

I'm currently evaluating PEDS for a project of mine, but I'm not quite at the point where I'm using it. If/when that day comes, I'm not at all opposed to taking a stab at it, and I'll definitely be in touch for some guidance!

Thanks again for this great work!