romeric/Fastor

Einsum: Enforce Output ordering for tensor networks

matthiasneuner opened this issue · 4 comments

For tensor network contractions using einsum, the ordering of the output is not guaranteed.
From the wiki:
Given that Fastor determines the output tensor implicitly, for tensor networks (more than two tensors in an einsum call) it might not always be apparent what the output tensor is since in these cases Fastor re-orders the indices to determine the best contraction pattern. In other words, if there are more than two tensors in an einsum call the output tensor may not be always ordered in any particular order. In such cases you can query this pattern. For example, say we have a contraction between 5 tensors.

However, it would be useful to enforce a particular ordering, in order to facilitate such expressions:

using A = Index<A_>;
using jin = Index<j_, i_, n_>;
using inkB = Index<i_, n_, k_, B_>;
using to_jAkB = OIndex<j_, A_, k_, B_>;

einsum< A, jin, inkB, to_jAkB > ( TensorA, TensorB, TensorC )

Currently, the implicit call of permute in einsum may not always result in the desired permutation, depending on a performance based reordering based on Tensor sizes.

Is such a thing a reasonable request for a future release?

I am not sure I understand you correctly. Are you saying einsum should enforce a specific ordering by default?

Short answer: Yes, if OIndex is specified.

Currently, for einsum the index ordering is not guaranteed for tensor networks (as pointed out in the wiki).
Assume you have a tensor network expression. The resulting indices depend on the sizes of the involved tensors.
Accordingly, you get a different index layout for identical expressions:

 #include "Fastor/Fastor.h"


using namespace Fastor;


int main(int argc, char *argv[])
{
#define SIZE_I 4 
#define SIZE_J 8
#define SIZE_K 10
#define SIZE_L 10 
#define SIZE_M 2 
#define SIZE_N 2 

    enum {i_, j_, k_, l_, m_, n_};
    using ijkl = Index< i_, j_, k_, l_>;
    using jm= Index<j_, m_>;
    using kn= Index<k_, n_>;

    Tensor<double, SIZE_I, SIZE_J, SIZE_K, SIZE_L> A(0.0);
    Tensor<double, SIZE_J, SIZE_M> B(0.0);
    Tensor<double, SIZE_K, SIZE_N> C(0.0);

    using resulting_index = einsum_helper<ijkl, jm, kn, decltype(A),decltype(B),decltype(C)>::resulting_index;

    std::cout << type_name <resulting_index>() << std::endl;

    return 0;
}

results in Fastor::Index<4ul, 0ul, 3ul, 5ul>

whereas

#include "Fastor/Fastor.h"


using namespace Fastor;


int main(int argc, char *argv[])
{
#define SIZE_I 4 
#define SIZE_J 8
#define SIZE_K 4  // <---- 
#define SIZE_L 4  // <---- changes here
#define SIZE_M 2 
#define SIZE_N 2 

    enum {i_, j_, k_, l_, m_, n_};
    using ijkl = Index< i_, j_, k_, l_>;
    using jm= Index<j_, m_>;
    using kn= Index<k_, n_>;

    Tensor<double, SIZE_I, SIZE_J, SIZE_K, SIZE_L> A(0.0);
    Tensor<double, SIZE_J, SIZE_M> B(0.0);
    Tensor<double, SIZE_K, SIZE_N> C(0.0);

    using resulting_index = einsum_helper<ijkl, jm, kn, decltype(A),decltype(B),decltype(C)>::resulting_index;

    std::cout << type_name <resulting_index>() << std::endl;

    return 0;
}

results in Fastor::Index<0ul, 3ul, 4ul, 5ul>

The problem is that this unpredictable reordering for tensor networks makes an additional passing of OIndex useless for tensor networks. For instance, the following expression does not result in the desired ordering, depending on a preceding performance based reordering for tensor networks:

    using to_imnl = OIndex < i_, m_, n_, l_>;
    einsum < ijkl, jm, kn, to_imnl> (A, B, C);

But isn't this already the case with OIndex? Your code

using to_imnl = OIndex < i_, m_, n_, l_>;
auto E = einsum < ijkl, jm, kn, to_imnl> (A, B, C);
print(type_name<decltype(E)>());

does result in

Tensor<double, 4, 2, 2, 10> // IMNL

for the first case and for the second case

Tensor<double, 4, 2, 2, 4> // IMNL

In both cases you get the desired ordering that you requested. Can you give me an example where using OIndex fails to give you the order you requested?

If you want to disable performance based ordering a.k.a graph-search optimisation you can use the compiler flag/macro -DFASTOR_KEEP_DP_FIXED

Closing this now as I think this is a non-issue. Feel free to re-open if you think your problem is still not addressed.