microsoft/qsharp-runtime

The trace simulator uses a non-optimal decomposition of the `CCNOT` operation

jond01 opened this issue · 0 comments

Please describe what you would like the maintenance work to be.

The trace simulator uses the following decomposition for CCNOT (CCNOT == CCX):

  1. operation CCX (control1 : Qubit, control2 : Qubit, target : Qubit) : Unit
    is Adj {
    PauliZFlip(PauliX, target);
    CCZ(control1, control2, target);
    Adjoint PauliZFlip(PauliX, target);
    }
  2. /// CCZ = exp( -iπ|111⟩⟨111| ) = exp( -iπ((I-Z)/2)⊗((I-Z)/2)⊗((I-Z)/2) )
    /// = exp(-i π/2³ I⊗I⊗I) ×
    /// exp( i π/2³ Z⊗I⊗I ) exp( i π/2³ I⊗Z⊗I ) exp( i π/2³ I⊗I⊗Z ) ×
    /// exp(-i π/2³ Z⊗Z⊗I ) exp(-i π/2³ I⊗Z⊗Z ) exp(-i π/2³ Z⊗I⊗Z ) ×
    /// exp( i π/2³ Z⊗Z⊗Z )
    /// Note that CCZ is symmetric with respect to all of its qubit arguments.
    operation CCZ (a : Qubit, b : Qubit, c : Qubit) : Unit
    {
    body (...)
    {
    // do not care about global phase because this CCZ implementation has no controlled version
    // this line and every line below uses 1 T gate
    InternalRzFrac(1, 3, a);
    InternalRzFrac(1, 3, b);
    InternalRzFrac(1, 3, c);
    ExpFracZZ(-1, 3, a, b);
    ExpFracZZ(-1, 3, b, c);
    ExpFracZZ(-1, 3, a, c);
    ExpFracZZZ(1, 3, a, b, c);
    }

However, this decomposition is naive and far from ideal.
A better decomposition is provided here:

  1. operation CCNOT (control1 : Qubit, control2 : Qubit, target : Qubit) : Unit is Adj + Ctl {
    body (...) {
    // [Page 15 of arXiv:1206.0758v3](https://arxiv.org/pdf/1206.0758v3.pdf#page=15)
    within {
    H(target);
    }
    apply {
    CCZ(control1, control2, target);
    }
    }
  2. internal operation CCZ (control1 : Qubit, control2 : Qubit, target : Qubit) : Unit is Adj {
    // [Page 15 of arXiv:1206.0758v3](https://arxiv.org/pdf/1206.0758v3.pdf#page=15)
    Adjoint T(control1);
    Adjoint T(control2);
    CNOT(target, control1);
    T(control1);
    CNOT(control2, target);
    CNOT(control2, control1);
    T(target);
    Adjoint T(control1);
    CNOT(control2, target);
    CNOT(target, control1);
    Adjoint T(target);
    T(control1);
    CNOT(control2, control1);
    }

The above is equivalent to the following circuit:
image

Indeed, when using the trace simulator with the latter decomposition - the depth is reduced from 15 to 8, and the T-depth is reduced from 5 to 4:

Metric Plain trace simulator Trace simulator with a custom decomposition
Depth 15 8
T-depth 5 4

See the complete code here:
https://gist.github.com/jond01/59d6e6732c55d8702479f8c6555e450c

This improvement is remarkable when estimating the resources (depth) required for a large quantum algorithm.
Is there any relevant reason for this inconsistency between the decompositions?