gonum/floats

Using cumsum from internal/asm

Closed this issue · 3 comments

Succo commented

It seems that there is an assembly version of cumsum in https://github.com/gonum/internal/tree/master/asm/f64

Any reason for not using it here yet?

None. The internal/asm/... packages are only recently changed to include things like that. You are welcome to send a PR adding calls to f64.

Add, CumProd, CumSum, Div, DivTo

$ benchstat old new 
name            old time/op  new time/op  delta
AddSmall-8      10.3ns ±14%   9.9ns ± 2%     ~     (p=0.386 n=10+9)
AddMed-8         314ns ± 6%   306ns ± 2%     ~     (p=0.076 n=10+8)
AddLarge-8      40.6µs ± 1%  41.7µs ± 1%   +2.72%  (p=0.000 n=10+9)
AddHuge-8       14.6ms ± 0%  14.9ms ± 3%   +2.41%  (p=0.000 n=10+10)
CumProdSmall-8  13.7ns ± 1%  13.4ns ± 5%   -1.90%  (p=0.006 n=10+10)
CumProdMed-8    4.22µs ± 8%  2.42µs ±22%  -42.54%  (p=0.000 n=10+10)
CumProdLarge-8   268µs ± 1%    76µs ± 1%  -71.61%  (p=0.000 n=9+10)
CumProdHuge-8   27.4ms ± 1%  14.1ms ± 1%  -48.44%  (p=0.000 n=10+10)
CumSumSmall-8   13.6ns ± 0%  14.1ns ± 0%   +3.68%  (p=0.000 n=6+9)
CumSumMed-8     2.63µs ± 1%  0.75µs ± 1%  -71.43%  (p=0.000 n=10+9)
CumSumLarge-8    267µs ± 1%    74µs ± 1%  -72.21%  (p=0.000 n=10+9)
CumSumHuge-8    27.6ms ± 2%  15.1ms ± 1%  -45.07%  (p=0.000 n=10+10)
DivSmall-8      11.7ns ± 0%   9.8ns ± 1%  -16.96%  (p=0.000 n=9+9)
DivMed-8        1.18µs ± 1%  0.59µs ± 0%  -49.57%  (p=0.000 n=10+6)
DivLarge-8       118µs ± 2%    59µs ± 0%  -49.66%  (p=0.000 n=10+9)
DivHuge-8       16.1ms ± 1%  14.7ms ± 1%   -8.20%  (p=0.000 n=10+10)
DivToSmall-8    11.8ns ± 1%  12.3ns ± 0%   +4.59%  (p=0.000 n=10+7)
DivToMed-8      1.18µs ± 1%  0.60µs ± 3%  -49.03%  (p=0.000 n=10+10)
DivToLarge-8     118µs ± 1%    60µs ± 2%  -49.23%  (p=0.000 n=10+10)
DivToHuge-8     20.5ms ± 1%  19.7ms ± 0%   -3.76%  (p=0.000 n=10+9)

I have these now, so I'll send the PR.

Succo commented

Ok nice, it seems you have beaten me to it 👍