writeData is very slow for array and slice
Closed this issue · 8 comments
case reflect.Array, reflect.Slice:
n := rv.Len()
for i := 0; i < n; i++ {
elem := rv.Index(i)
err := writeData(w, elem)
if err != nil {
return err
}
}
return nil
}
looping through array and slice calling writeData
which uses reflection for each element is slow, consider use go code gen to generate write data for each data type.
would you have a small (or as small as possible) repro?
(phrased another way: what are the timings you experience for a []float64 slice of N elements?)
could you test 94eda44 ?
I get this kind of improvement:
benchmark old ns/op new ns/op delta
BenchmarkWriteFloatSlice-4 232706 35049 -84.94%
amazing, thanks!
I think you don't have to do
case []int:
for _, vv := range v {
err := binary.Write(w, ble, int64(vv))
if err != nil {
return err
}
}
return nil
Probably can just do
case []int:
err := binary.Write(w, ble, []int(v))
Could you add support for all slice type that binary.Write supports
? By looking into binary source, here are they slices type that it supports.
case []int8:
case []uint8:
case []int16:
case []uint16:
case []int32:
case []uint32:
case []int64:
case []uint64:
My use case is I am serializing few GB of grayscale image (uint8) into npy file for training cnn.
Btw, really appreciate for the library! Able to do things in go as much as possible saved me a lot of time.
I tested type-switching on a few slice-types (starting with []float64
) but it is useless.
at least when I call binary.Write(w, binary.LittleEndian, v)
as binary.Write
will type-switch/use-reflect under the hood.
so, unless I reach into the guts of binary.Write
that's as far as we can get, ATM.
do you have any timings for before/after the (tentative) fix with these GB-size []uint8
slices?
(alternatively, if you have a code snippet + some input data, I am willing to do the benchmark myself)
With the old implementation, writing 1.3GB took me about 40min on macbook pro, so I killed the process (knowing that I have few more GB but I can't wait that long). Instead I generated 700mb of data with 20min. And right now I am training with that 700mb data. So here is my very rough numbers.
My code is very trivial, similar to
data := make([]uint8, 1024*1024*1024)//1GB
npyio.Write(f, data)
Instead I fill data
with grayscale image, but without it should still produce similar speed numbers.
ok. I've added a fast-path for writing []uint8
:
benchmark old ns/op new ns/op delta
BenchmarkWriteUint8Slice-4 2418 1977 -18.24%
benchmark old allocs new allocs delta
BenchmarkWriteUint8Slice-4 19 17 -10.53%
benchmark old bytes new bytes delta
BenchmarkWriteUint8Slice-4 1472 440 -70.11%
feel free to re-open if that's not sufficiently fast for your use-case.