Using cumsum from internal/asm
Closed this issue · 3 comments
Succo commented
It seems that there is an assembly version of cumsum in https://github.com/gonum/internal/tree/master/asm/f64
Any reason for not using it here yet?
kortschak commented
None. The internal/asm/...
packages are only recently changed to include things like that. You are welcome to send a PR adding calls to f64
.
kortschak commented
Add
, CumProd
, CumSum
, Div
, DivTo
$ benchstat old new
name old time/op new time/op delta
AddSmall-8 10.3ns ±14% 9.9ns ± 2% ~ (p=0.386 n=10+9)
AddMed-8 314ns ± 6% 306ns ± 2% ~ (p=0.076 n=10+8)
AddLarge-8 40.6µs ± 1% 41.7µs ± 1% +2.72% (p=0.000 n=10+9)
AddHuge-8 14.6ms ± 0% 14.9ms ± 3% +2.41% (p=0.000 n=10+10)
CumProdSmall-8 13.7ns ± 1% 13.4ns ± 5% -1.90% (p=0.006 n=10+10)
CumProdMed-8 4.22µs ± 8% 2.42µs ±22% -42.54% (p=0.000 n=10+10)
CumProdLarge-8 268µs ± 1% 76µs ± 1% -71.61% (p=0.000 n=9+10)
CumProdHuge-8 27.4ms ± 1% 14.1ms ± 1% -48.44% (p=0.000 n=10+10)
CumSumSmall-8 13.6ns ± 0% 14.1ns ± 0% +3.68% (p=0.000 n=6+9)
CumSumMed-8 2.63µs ± 1% 0.75µs ± 1% -71.43% (p=0.000 n=10+9)
CumSumLarge-8 267µs ± 1% 74µs ± 1% -72.21% (p=0.000 n=10+9)
CumSumHuge-8 27.6ms ± 2% 15.1ms ± 1% -45.07% (p=0.000 n=10+10)
DivSmall-8 11.7ns ± 0% 9.8ns ± 1% -16.96% (p=0.000 n=9+9)
DivMed-8 1.18µs ± 1% 0.59µs ± 0% -49.57% (p=0.000 n=10+6)
DivLarge-8 118µs ± 2% 59µs ± 0% -49.66% (p=0.000 n=10+9)
DivHuge-8 16.1ms ± 1% 14.7ms ± 1% -8.20% (p=0.000 n=10+10)
DivToSmall-8 11.8ns ± 1% 12.3ns ± 0% +4.59% (p=0.000 n=10+7)
DivToMed-8 1.18µs ± 1% 0.60µs ± 3% -49.03% (p=0.000 n=10+10)
DivToLarge-8 118µs ± 1% 60µs ± 2% -49.23% (p=0.000 n=10+10)
DivToHuge-8 20.5ms ± 1% 19.7ms ± 0% -3.76% (p=0.000 n=10+9)
I have these now, so I'll send the PR.
Succo commented
Ok nice, it seems you have beaten me to it 👍