Benchmarking
milankl opened this issue ยท 3 comments
This is to summarize the performance of SoftPosit.jl measured via the conversion to and from Posit16 to and from Float32
v0.3 (SoftPosit-C) | v0.4 (SoftPosit.jl) | v0.5* | |
---|---|---|---|
P16->F32 | 32ns | 0.76ns | 0.65ns |
F32->P16 | 100ns | 1.1ns | 0.865ns |
P16->F32->P16 | 120ns | 1.9ns | 1.4ns |
*upcoming release which will include #68, the new 2022 posit standard and type-flexible conversions such that all PositN(::FloatN)
conversion use a single function (with multiple-dispatch). Tested via
julia> using SoftPosit, BenchmarkTools
julia> function f!(::Type{TB},A::Array{TA}) where {TB,TA}
@inbounds for i in eachindex(A)
A[i] = TA(TB(A[i]))
end
end
julia> function f!(B::Array{TB},A::Array{TA}) where {TB,TA}
@inbounds for i in eachindex(A,B)
B[i] = TB(A[i])
end
end
julia> A = Posit16.(rand(UInt16,1000000));
julia> B = rand(Float32,1000000);
julia> @btime f!($B,$A);
julia> @btime f!($A,$B);
julia> @btime f!($Float32,$A);
julia> @btime f!($Posit16,$B);
With #68 addition is almost 2x faster than it was in v0.4
julia> using SoftPosit
julia> A,B = Posit16.(rand(1000,1000)),Posit16.(rand(1000,1000));
julia> using BenchmarkTools
julia> @btime +($A,$B);
15.684 ms (2 allocations: 1.91 MiB)
compared to
julia> @btime +($A,$B);
25.477 ms (2 allocations: 1.91 MiB)
Completely in-place, to remove memory allocations and GC from the game:
julia> using SoftPosit, BenchmarkTools
julia> @benchmark C .= A .+ B setup=(N=1000; A=Posit16.(rand(N,N)); B=Posit16.(rand(N,N)); C=similar(A))
BenchmarkTools.Trial: 141 samples with 1 evaluation.
Range (min โฆ max): 23.458 ms โฆ 27.493 ms โ GC (min โฆ max): 0.00% โฆ 0.00%
Time (median): 24.175 ms โ GC (median): 0.00%
Time (mean ยฑ ฯ): 24.234 ms ยฑ 442.509 ฮผs โ GC (mean ยฑ ฯ): 0.00% ยฑ 0.00%
โ โโโโ
โโโโโโ
โโโ
โโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
23.5 ms Histogram: frequency by time 25.9 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> pkgversion(SoftPosit)
v"0.4.0"
vs
julia> using SoftPosit, BenchmarkTools
julia> @benchmark C .= A .+ B setup=(N=1000; A=Posit16.(rand(N,N)); B=Posit16.(rand(N,N)); C=similar(A))
BenchmarkTools.Trial: 172 samples with 1 evaluation.
Range (min โฆ max): 19.808 ms โฆ 21.446 ms โ GC (min โฆ max): 0.00% โฆ 0.00%
Time (median): 20.347 ms โ GC (median): 0.00%
Time (mean ยฑ ฯ): 20.382 ms ยฑ 294.894 ฮผs โ GC (mean ยฑ ฯ): 0.00% ยฑ 0.00%
โโ โ โ โ
โโโโโโโโ
โโ
โโโโโโโโโโโ
โ
โโโโโ
โโ
โโโโโโโ
โโโโโโโโ
โโโ
โโโโโโโโโโโโโ โ
19.8 ms Histogram: frequency by time 21.2 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> pkgversion(SoftPosit)
v"0.5.0"
Timings for me are closer, but definitely better anyway (maximum time on main
is 10% less than the minimum time in v0.4.0)
Yeah, I mean one could define +(::PositN,::PositN)
(and *
,/
) directly and without the conversion which would likely give another big speedup, but at the moment it's only important that it's not much slower than where we started from. I like the conversion to float as it means less code, easier to maintain and I don't really have the time at the moment to write down functions for that. Just wanted to update SoftPosit.jl to the new standard first.