Compile HSL with -03 optimization
Closed this issue · 1 comments
amontoison commented
After some discussions about performances with @geoffroyleconte, I compared OpenBLAS32 / MKL backend and a basic compilation with a " -O3" compilation.
We should add -03
after gfortran
and gcc
by default.
const HSL_FC = haskey(ENV, "HSL_FC") ? ENV["HSL_FC"] : "gfortran -O3"
const HSL_F77 = haskey(ENV, "HSL_F77") ? ENV["HSL_F77"] : HSL_FC
const HSL_CC = haskey(ENV, "HSL_CC") ? ENV["HSL_CC"] : "gcc -O3"
# Current version of HSL
using HSL, MatrixMarket, SuiteSparseMatrixCollection
using LinearAlgebra, Printf, BenchmarkTools
ssmc = ssmc_db(verbose=false)
matrix = ssmc_matrices(ssmc, "Boeing", "pwtk")
path = fetch_ssmc(matrix, format="MM")
n = matrix.nrows[1]
A = MatrixMarket.mmread(joinpath(path[1], "$(matrix.name[1]).mtx"))
b = ones(n)
b_norm = norm(b)
# Solve Ax = b.
LDL = @btime Ma57($A) # 7.566 s (36 allocations: 343.44 MiB)
@btime ma57_factorize($LDL) # 39.155 s (2 allocations: 851.30 KiB)
@btime ma57_solve($LDL, $b) # 497.909 ms (6 allocations: 4.16 MiB)
import LinearAlgebra, MKL_jll
LinearAlgebra.BLAS.lbt_forward(MKL_jll.libmkl_rt_path, clear=true, verbose=true)
# Solve Ax = b.
LDL = @btime Ma57($A) # 7.466 s (36 allocations: 343.44 MiB)
@btime ma57_factorize($LDL) # 25.038 s (2 allocations: 851.30 KiB)
@btime ma57_solve($LDL, $b) # 230.605 ms (6 allocations: 4.16 MiB)
# HSL compiled with -O3
using HSL, MatrixMarket, SuiteSparseMatrixCollection
using LinearAlgebra, Printf, BenchmarkTools
ssmc = ssmc_db(verbose=false)
matrix = ssmc_matrices(ssmc, "Boeing", "pwtk")
path = fetch_ssmc(matrix, format="MM")
n = matrix.nrows[1]
A = MatrixMarket.mmread(joinpath(path[1], "$(matrix.name[1]).mtx"))
b = ones(n)
b_norm = norm(b)
# Solve Ax = b.
LDL = @btime Ma57($A) # 3.123 s (36 allocations: 343.44 MiB)
@btime ma57_factorize($LDL) # 14.857 s (2 allocations: 851.30 KiB)
@btime ma57_solve($LDL, $b) # 314.188 ms (6 allocations: 4.16 MiB)
import LinearAlgebra, MKL_jll
LinearAlgebra.BLAS.lbt_forward(MKL_jll.libmkl_rt_path, clear=true, verbose=true)
# Solve Ax = b.
LDL = @btime Ma57($A) # 3.345 s (36 allocations: 343.44 MiB)
@btime ma57_factorize($LDL) # 9.488 s (2 allocations: 851.30 KiB)
@btime ma57_solve($LDL, $b) # 186.227 ms (6 allocations: 4.16 MiB)
dpo commented
We could add -O3
directly to the build_*.jl
files, couldn't we?