`PlusOneVector` performance regression
Closed this issue · 0 comments
mtsokol commented
Hi @willow-ahrens,
Here's my attempt to reproduce PlusOneVector
performance regression. Here I run two short examples. One with native Julia Vectors and second one with PlusOneVector
. I would expect that both should run with same performance, but the second one is x10 slower:
using Finch
using BenchmarkTools
LEN = 10000;
a = Tensor(rand(LEN, LEN - 10) * 10);
b = Tensor(rand(LEN - 10, LEN) * 10);
s = fsprand(LEN, LEN, 0.0001);
a_l = lazy(swizzle(a, 1, 2));
b_l = lazy(swizzle(b, 1, 2));
s_l = lazy(swizzle(s, 1, 2));
plan = sum(s_l[:, :, nothing] .* a_l[:, nothing, :] .* permutedims(b_l, (2, 1))[nothing, :, :], dims=3);
compute(plan);
julia> @benchmark compute(plan)
BenchmarkTools.Trial: 1 sample with 1 evaluation.
Single result which took 9.678 s (0.00% GC) to evaluate,
with a memory estimate of 2.90 MiB, over 599 allocations.
# ======================= =======================
new_tbl = (PlusOneVector(s.lvl.tbl[1] .- 1), PlusOneVector(s.lvl.tbl[2] .- 1));
new = SparseCOOLevel(s.lvl.lvl, s.lvl.shape, s.lvl.ptr, new_tbl);
s_new = Tensor(new);
a_l = lazy(swizzle(a, 1, 2));
b_l = lazy(swizzle(b, 1, 2));
s_l = lazy(swizzle(s_new, 1, 2));
plan = sum(s_l[:, :, nothing] .* a_l[:, nothing, :] .* permutedims(b_l, (2, 1))[nothing, :, :], dims=3);
compute(plan);
julia> @benchmark compute(plan)
BenchmarkTools.Trial: 1 sample with 1 evaluation.
Single result which took 98.084 s (7.89% GC) to evaluate,
with a memory estimate of 40.44 GiB, over 2317014172 allocations.