w8r/avl

suggestion: let the remove operation preserve node identity

Opened this issue · 2 comments

in C++ STL's term, don't invalidate iterators on removal of elements, except those pointing to the removed node.

(if the nodes returned in various APIs are to be regarded as "iterators", with the assumption that users don't tamper with it, which is common in the javascript world)

the modification is rather straightforward. in remove function, instead of assigning the data and key of the "lifted node" to the "deleted node", maintain the left, right, parentreferences to/from the "lifted node" and cut the "deleted node"'s outward references (but preserve the key and data).

within my knowledge of the currently implemented APIs, the remove operation is the only one that breaks this. (more care may be needed if split, join are to be added)

this allows iterator usages to work (there might not be much, but i really have code using iterators the iterator way). without this property, though workaround exists, one has to find by .key, prev/next, store .key again and again, that's hurts both readability (+code size) and overall performance.

also returning the node deleted makes more sense (may be inserted again with the fictional insertNode API).

there might be performance degradation due to more operations to do. but since the remove operation is the fastest operation here, it's nice to have this property with a little price. or maybe removeGracefully?

w8r commented

some other doc fixes

  • the doc missing the return type for insert while it returns Node.
  • name of data and value is not consistent
  • inclusiveness of range is not specified

and further suggestions mainly due to my own taste (on those not mentioned here, your library is more of my taste than any other avl tree libraries found on npm):

  • noDuplicate by default, hmm... yes i prefer it... "[multi]set" should be explicitly specified.
  • also can we guarantee that duplicate keys will always stay in the order how they were inserted?
  • insert could accept a third boolean argument meaning "don't update data/value if the keyalready exists" (defaults to false, that means do update) (only works when noDuplicates? or works even when duplicates allowed?)
  • insertNode (if it's to be added) errors when key exists and noDuplicates is set, because we can't decide which to keep, since both could be regarded as iterators.
  • hyperextend find, in my opinion it's better to be supported by an optional argument (rather than 4 more APIs with various confusing names, such as xxBound), namely firstGT, firstGE, lastLT, lastLE, where the default argument just means Equal(but returns arbitrary one in the equal range when duplicates allowed)
  • rangeByNodes, that kinda matches c++ iterators on multiset, using nodes returned by extended find.
  • btw, i think it's better for range* to use [lo,hi) (or say [start,end)) by default, no-op when lo is null, iterate to end whenhi is null. better to have an argument reversed. even better another inclusive? (i admit there might be some holy war here)
  • rename pop to popMin, because javascript array pops the last value and shifts the first value, but on a tree we don't shift. it's better to be clear here.