Long compile times for (n~12)-index tensors

Question

Long compile times for (n~12)-index tensors

Closed this issue 5 years ago · 7 comments

First, thank you for this awesome package as well as the documentation. I went through it today and learned quite a few things!

I am not particularly sure if the below is an issue or not, but here it is:

I try to make a matrix product state, using a vector (e.g. ground state of some spin chain that has been found by exact diagonalization of hamiltonian in a U(1) charge sector). I get very large, rapidly growing, compile times of hundreds of seconds even for L=12:

L = 8 takes ~ 10 seconds
L = 10 takes ~ 25 seconds
L = 12 takes ~ 210 seconds
(took so long I stopped the compilation for larger!)

I understand this is not an ideal approach of writing this function since it requires a (L+2)-index tensor (3-index tensors should be enough), therefore longer compile times are expected to compile functions for larger index tensors. So, my question/issue is: are these compile times too large or expected?

Here is the function that you can test by vector2mps(L, rand(binomial(L, div(L,2)))

function vector2mps(L::Int, v::Vector{T}; m::Int=div(L,2)) where {T<:Number}

    V0 = U₁Space(U₁(0)=>1)    # dummy space for start
    VL = U₁Space(U₁(m)=>1)    # dummy space for end   
    Vd = U₁Space(U₁(x)=>1 for x=0:1)   # physical spaces

    # is there a better way to initialize?
    A = TensorMap(TensorKit.SectorDict(U₁(m)=>reshape(v,length(v), 1)),
              V0 ⊗ prod(Vd for _=1:L) ← VL)

    mps = Vector{TensorMap{U₁Space,2,1}}()
    for x = 1:L-1
        U,S,Vt = svd(permuteind(A, Tuple(1:2), Tuple(3:L-x+3)))
        push!(mps, U)
        A = S*Vt
    end
    push!(mps, permuteind(A, (1,2), (3,)))
    mps
end

Answer 1 · 2019-12-23T22:14:32.000Z

Dear @amiragha , glad you like it. I hope to update and hopefully complete the manual soon.

Compilation times can indeed be somewhat of a problem, and it seems to be more pronounced in combination with code that is not type-stable. I mostly count on Julia itself to improve this situation (I believe this is one of the current focus points of the main Julia developers), but if there are design changes in TensorKit that make this problem worse, I am happy to reconsider them. I played a bit around with your example, but did not find a quick change in my code that makes compilation times significantly lower.

Answer 2 · 2019-12-23T22:45:45.000Z

Thanks for looking into it. Indeed my first impression was that a quick fix isn't possible. I will keep playing with it...

I close the issue since there isn't much that can be done on this end for now.

Answer 3 · 2020-01-08T21:26:38.000Z

I have a fix ready, i.e. I managed to isolate one specific part of the code which seems to give type inference a hard time, and replace it with something more inference-friendly. Compilation times are still high (order 10 seconds or more) for different values of L, due to the large number of methods that needs to be compiled, but at least it doesn't blow up as quickly. I am currently running tests locally and will push the update if all tests pass.

Answer 4 · 2020-01-08T23:50:47.000Z

I've pushed to the master branch, could you also try and see if this solves your problems (to a large extent)

Answer 5 · 2020-01-09T00:50:31.000Z

Thanks. Yes, the compilation time is now very reasonable. I tried all the way in 20s for L where easily the computation time exceeded the compilation. This example is now definitely usable.

I looked at your commit and seems like not much has changed, except the new separate _get_permute function that now knows what the type of the fusion dictionary is. Interesting how such an small change has this big of a difference.

Answer 6 · 2020-01-09T09:08:42.000Z

Yes, the result of permuting fusion trees is stored in a global cache (a Least Recently Used LRU{Any,Any} dictionary), but type information about that result is lost. However, the permute function can easily restore this type information, although this particular step seems seems to be hard for type inference.

In the old approach, this type information was restored within the permute function, which needs to be recompiled for every different N1 and N2. Now this step is deferred to another function, which will not be recompiled for different N1 and N2 (via @nospecialize), and this seems to help quite a lot.

Answer 7 · 2020-01-09T09:08:55.000Z

I will now close this with a more relaxed mind :-).