/trope

golang rope-like data-structure for very large collections

Primary LanguageGoMIT LicenseMIT

trope

Status GoDoc codecov GoReportCard

Trope is a golang package for dealing with very large immutable arrays at decent performance with respect to random access edits.

It uses a rope-like tree structure (which is neither balanced by default, nor is it binary as with regular ropes).

Most rope-like structures perform poorly at small sizes. So, the package provides a Hybrid implementation which uses regular slices for small sizes and switches to the rope-like structure at configured high threshold to get the best of both worlds.

Benchmark stats

Trope is mainly focused on good performance under random edits (not just appends).

The benchmark is run via something like:

$> go test --bench=. -strlen=1000000

The following table shows the time (in microseconds) taken for each of the implementations for 100 random splice operations (on my machine). Trope performs at a relatively fixed cost without much dependency on the size of the array but this fixed cost only outweighs the raw slice performance at around 15k size.

String Size 5k 10k 100k 1M
Trope 219 188 181 188
String 70 134 1040 10195
Skiprope 123 193 1240 12159
Rope 233 256 452 2501

The memory usage is also a bit of a concern. Here too, the trope implementation has a much more steady usage that does not depend that much on the size of the string.

$ go test --bench=. --strlen=100000 -benchmem
goos: darwin
goarch: amd64
pkg: github.com/perdata/trope
BenchmarkTrope-4      	   10000	    180376 ns/op	  172648 B/op	    1519 allocs/op
BenchmarkString-4     	    2000	   1079424 ns/op	10084772 B/op	     200 allocs/op
BenchmarkSkiprope-4   	    1000	   1235641 ns/op	  339567 B/op	    3231 allocs/op
BenchmarkRope-4       	    3000	    451715 ns/op	  671616 B/op	    4398 allocs/op

For string sizes larger than 100k, trope does about the same while other implementations get worse (either in B/op or in allocs/op):

$ go test --bench=. --strlen=1000000 -benchmem
goos: darwin
goarch: amd64
pkg: github.com/perdata/trope
BenchmarkTrope-4      	   10000	    186289 ns/op	  188648 B/op	    1577 allocs/op
BenchmarkString-4     	     100	  10645611 ns/op	100033109 B/op	     200 allocs/op
BenchmarkSkiprope-4   	     100	  11908668 ns/op	 3331194 B/op	   31355 allocs/op
BenchmarkRope-4       	     500	   2436868 ns/op	 4276162 B/op	    4399 allocs/op

For relatively smaller string sizes, trope stays at roughly the same usage and only Skiprope is better.

$ go test --bench=. --strlen=10000 -benchmem
goos: darwin
goarch: amd64
pkg: github.com/perdata/trope
BenchmarkTrope-4      	   10000	    193261 ns/op	  182008 B/op	    1602 allocs/op
BenchmarkString-4     	   10000	    136390 ns/op	  796433 B/op	     200 allocs/op
BenchmarkSkiprope-4   	   10000	    188041 ns/op	   33977 B/op	     415 allocs/op
BenchmarkRope-4       	    5000	    267254 ns/op	  323072 B/op	    4584 allocs/op

All these combinations point to a custom "hybrid" method that uses flat strings for small strings and the actual tree structure only for large strings. Hence the Hybrid type which implements this idea

A Hybrid implementation which switches from string to trope.Node at 15k size and switches back at 10k size has following characterestics at 5k string size:

BenchmarkTrope-4      	   10000	    220341 ns/op	  211384 B/op	    1871 allocs/op
BenchmarkHybrid-4     	   20000	     88510 ns/op	  292664 B/op	     413 allocs/op
BenchmarkString-4     	   20000	     71310 ns/op	  283440 B/op	     200 allocs/op

This basically tracks the string slice costs at low sizes. At 200k string sizes, the Hybrid performs similar to Trope basically getting the best of both worlds

BenchmarkTrope-4      	   10000	    186928 ns/op	  188264 B/op	    1583 allocs/op
BenchmarkHybrid-4     	   10000	    193612 ns/op	  188264 B/op	    1583 allocs/op
BenchmarkString-4     	    1000	   2243631 ns/op	20226499 B/op	     200 allocs/op