/FastLogSumExp.jl

fast logsumexp functions that leverage LoopVectorization

Primary LanguageJulia

This package includes specialized functions to handle logsumexp(X::AbstractVector}) and logsumexp(X::AbstractMatrix; dims=2) for both Float64 and ForwardDiff.Dual numbers. These versions are 5-10x faster than LogExpFunctions.logsumexp. Uses LoopVectorization.@turbo, as well as (in the background) VectorizationBase.vexp and SLEEFPirates.log_fast.

See issue at JuliaSIMD/LoopVectorization.jl#437. Thanks, @chriselrod for pointing me to https://github.com/PumasAI/SimpleChains.jl/blob/main/src/forwarddiff_matmul.jl.

Benchmarks:

  "M" => 2-element BenchmarkTools.BenchmarkGroup:
          tags: ["M", "Matrix"]
          "Float64" => 3-element BenchmarkTools.BenchmarkGroup:
                  tags: ["Float64"]
                  "LogExpFunctions" => Trial(28.400 μs)
                  "Fast LogExp" => Trial(11.300 μs)
                  "Turbo" => Trial(5.100 μs)
          "Dual" => 3-element BenchmarkTools.BenchmarkGroup:
                  tags: ["Dual"]
                  "Reinterp" => Trial(12.100 μs)
                  "LogExpFunctions" => Trial(56.100 μs)
                  "Fast LogExp" => Trial(26.400 μs)
  "V" => 2-element BenchmarkTools.BenchmarkGroup:
          tags: ["V", "Vector"]
          "Float64" => 2-element BenchmarkTools.BenchmarkGroup:
                  tags: ["Float64"]
                  "LogExpFunctions" => Trial(5.900 μs)
                  "Turbo" => Trial(1.700 μs)
          "Dual" => 3-element BenchmarkTools.BenchmarkGroup:
                  tags: ["Dual"]
                  "Reinterp" => Trial(2.300 μs)
                  "LogExpFunctions" => Trial(11.800 μs)
                  "Reinterp no tmp" => Trial(2.300 μs)