klauspost/pgzip

dictFlatePool not helping much (lots of new flate compressor allocations)

flyingmutant opened this issue · 2 comments

Here is a snippet of alloc_space heap profile for program using pgzip to write ~770MB file:

> go tool pprof -alloc_space PROG mem.prof
Entering interactive mode (type "help" for commands)
(pprof) top20
66762.39MB of 67431.14MB total (99.01%)
Dropped 55 nodes (cum <= 337.16MB)
Showing top 20 nodes out of 41 (cum >= 12452.37MB)
      flat  flat%   sum%        cum   cum%
17734.77MB 26.30% 26.30% 17734.77MB 26.30%  compress/flate.(*compressor).init
...
 3473.48MB  5.15% 81.09%  3473.48MB  5.15%  github.com/.../vendor/github.com/klauspost/pgzip.(*Writer).compressCurrent
...
 2757.36MB  4.09% 89.66% 20834.23MB 30.90%  github.com/.../vendor/github.com/klauspost/pgzip.compressBlock

During compression, memory usage of program grows by ~5GB. With regular gzip, memory usage is constant during compression.

Am I reading this right that dictFlatePool is not helping as it should? Shouldn't the number of compressors be limited by the number of blocks?

@flyingmutant - thanks for reporting.

Is it possible for you to share the code on how you send data to pgzip?

I can't post the actual code, but what it does is basically:

f, _ := os.Create(path)
bufF := bufio.NewWriter(f) // this may very well be unnecessary
gzF := gzip.NewWriter(bufF)

buf := make([]byte, 0, 512)
for _, record := range records {
        record.writeTo(buf)
        gzF.Write(buf)
        buf = buf[:0]
}

So, lots of small records, each is written to byte slice first (with strconv.AppendUint and the like), byte slice is sent to pgzip. Nothing fancy.