milankl/SoftPosit.jl

Benchmarking

milankl opened this issue ยท 3 comments

This is to summarize the performance of SoftPosit.jl measured via the conversion to and from Posit16 to and from Float32

v0.3 (SoftPosit-C) v0.4 (SoftPosit.jl) v0.5*
P16->F32 32ns 0.76ns 0.65ns
F32->P16 100ns 1.1ns 0.865ns
P16->F32->P16 120ns 1.9ns 1.4ns

*upcoming release which will include #68, the new 2022 posit standard and type-flexible conversions such that all PositN(::FloatN) conversion use a single function (with multiple-dispatch). Tested via

julia> using SoftPosit, BenchmarkTools
julia> function f!(::Type{TB},A::Array{TA}) where {TB,TA}
           @inbounds for i in eachindex(A)
               A[i] = TA(TB(A[i]))
           end
       end
julia> function f!(B::Array{TB},A::Array{TA}) where {TB,TA}
                  @inbounds for i in eachindex(A,B)
                      B[i] = TB(A[i])
                  end
              end

julia> A = Posit16.(rand(UInt16,1000000));
julia> B = rand(Float32,1000000);
julia> @btime f!($B,$A);
julia> @btime f!($A,$B);
julia> @btime f!($Float32,$A);
julia> @btime f!($Posit16,$B);

With #68 addition is almost 2x faster than it was in v0.4

julia> using SoftPosit
julia> A,B = Posit16.(rand(1000,1000)),Posit16.(rand(1000,1000));
julia> using BenchmarkTools
julia> @btime +($A,$B);
  15.684 ms (2 allocations: 1.91 MiB)

compared to

julia> @btime +($A,$B);
  25.477 ms (2 allocations: 1.91 MiB)

Completely in-place, to remove memory allocations and GC from the game:

julia> using SoftPosit, BenchmarkTools

julia> @benchmark C .= A .+ B setup=(N=1000; A=Posit16.(rand(N,N)); B=Posit16.(rand(N,N)); C=similar(A))
BenchmarkTools.Trial: 141 samples with 1 evaluation.
 Range (min โ€ฆ max):  23.458 ms โ€ฆ  27.493 ms  โ”Š GC (min โ€ฆ max): 0.00% โ€ฆ 0.00%
 Time  (median):     24.175 ms               โ”Š GC (median):    0.00%
 Time  (mean ยฑ ฯƒ):   24.234 ms ยฑ 442.509 ฮผs  โ”Š GC (mean ยฑ ฯƒ):  0.00% ยฑ 0.00%

                โ–  โ–ˆโ–‚โ–โ–‚                                         
  โ–ƒโ–โ–ƒโ–„โ–โ–…โ–โ–โ–…โ–†โ–†โ–ˆโ–†โ–ˆโ–ˆโ–‡โ–‡โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–†โ–†โ–…โ–…โ–ƒโ–†โ–„โ–ƒโ–ƒโ–โ–„โ–โ–โ–ƒโ–ƒโ–โ–โ–โ–โ–ƒโ–โ–โ–โ–โ–โ–โ–ƒโ–โ–โ–โ–โ–ƒโ–โ–โ–โ–โ–โ–ƒ โ–ƒ
  23.5 ms         Histogram: frequency by time         25.9 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> pkgversion(SoftPosit)
v"0.4.0"

vs

julia> using SoftPosit, BenchmarkTools

julia> @benchmark C .= A .+ B setup=(N=1000; A=Posit16.(rand(N,N)); B=Posit16.(rand(N,N)); C=similar(A))
BenchmarkTools.Trial: 172 samples with 1 evaluation.
 Range (min โ€ฆ max):  19.808 ms โ€ฆ  21.446 ms  โ”Š GC (min โ€ฆ max): 0.00% โ€ฆ 0.00%
 Time  (median):     20.347 ms               โ”Š GC (median):    0.00%
 Time  (mean ยฑ ฯƒ):   20.382 ms ยฑ 294.894 ฮผs  โ”Š GC (mean ยฑ ฯƒ):  0.00% ยฑ 0.00%

               โ–ˆโ– โ–ƒ      โ–† โ–‚                                    
  โ–„โ–โ–โ–โ–„โ–„โ–โ–…โ–ƒโ–…โ–‡โ–โ–‡โ–ˆโ–ˆโ–†โ–ˆโ–†โ–‡โ–‡โ–…โ–…โ–ˆโ–ˆโ–†โ–ˆโ–…โ–†โ–…โ–‡โ–ˆโ–ˆโ–„โ–‡โ–„โ–…โ–โ–„โ–โ–ƒโ–„โ–ƒโ–„โ–…โ–ƒโ–ƒโ–…โ–„โ–โ–„โ–โ–โ–ƒโ–โ–โ–โ–โ–ƒโ–ƒโ–ƒ โ–ƒ
  19.8 ms         Histogram: frequency by time         21.2 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> pkgversion(SoftPosit)
v"0.5.0"

Timings for me are closer, but definitely better anyway (maximum time on main is 10% less than the minimum time in v0.4.0)

Yeah, I mean one could define +(::PositN,::PositN) (and *,/) directly and without the conversion which would likely give another big speedup, but at the moment it's only important that it's not much slower than where we started from. I like the conversion to float as it means less code, easier to maintain and I don't really have the time at the moment to write down functions for that. Just wanted to update SoftPosit.jl to the new standard first.