Incorrect optimization of multi-parameter gates
Closed this issue ยท 19 comments
It's unclear if/how much parameterized gates are supported -- the README does state
We do not support parameterized gates currently.
But parameterized gates (e.g., u1
, u2
, and u3
) are implemented. However, I'm observing incorrect optimization results. For example, below is a minimal experiment to reproduce an optimization error:
To generate ECC:
- use params
q=1
,m=4
,n=2
- use gate-set
{GateType::add, GateType::u2}
The input circuit qasm to be optimized:
OPENQASM 2.0;
include "qelib1.inc";
qreg q[1];
u2(0, pi) q[0];
u2(0, pi) q[0];
where the two u2
gates should cancel each other. But the "optimized" circuit as produced by quartz is:
include "qelib1.inc";
qreg q[1];
u2(pi*1.000000,pi*1.000000) q[0];
which is mathematically incorrect.
Thanks for reporting the issue! Could you please post your code so that we can better trace the bug?
Thanks for following up, @zikun-li!
To reproduce, below please find my test code:
#include "gen_ecc_set.h"
#include "quartz/tasograph/tasograph.h"
using namespace quartz;
int main() {
int q = 1, m = 4, n = 2;
std::vector<GateType> gate_set{GateType::add, GateType::u2};
std::string ecc_name = "test_";
std::string ecc_fpath = ecc_name + "complete_ECC_set.json";
gen_ecc_set(gate_set, ecc_name, true, true, q, m, n);
const std::string qasm = R""""(
OPENQASM 2.0;
include "qelib1.inc";
qreg q[1];
u2(0, pi) q[0];
u2(0, pi) q[0];
)"""";
gate_set.insert(gate_set.end(), {
GateType::input_qubit, GateType::input_param
});
Context ctx(gate_set);
auto graph = Graph::from_qasm_str(&ctx, qasm);
auto optimized_graph = graph->optimize(
&ctx, ecc_fpath, "trivial_test", /*print_message=*/ true
);
std::cout << "\ninput circuit:" << std::endl;
graph->to_qasm(true, false);
std::cout << "\noutput circuit:" << std::endl;
optimized_graph->to_qasm(true, false);
return 0;
}
which produces the following output for me:
*** ch(,q=t) = 102
BFS: 1 representative DAGs to search with 0 gates.
Considering a total of 103 DAGs split into 103 hash values...
Processed 103 hash values that had only 1 circuit sequence, now processing the remaining 0 ones with 2 or more circuit sequences...
Start checking equivalence with different hash...
Solver invoked 0 times to find 0 equivalences with different hash, including 0 out of 0 possible equivalences under phase shift.
0 equivalences found in 0.022850895998999476 seconds (solver invoked 0 times for 103 DAGs with 103 different hash values and 0 potential equivalences), output 103 equivalence classes.
Json saved in 0.007132362996344455 seconds.
BFS: 102 representative DAGs to search with 1 gates.
Considering a total of 487 DAGs split into 298 hash values...
Processed 109 hash values that had only 1 circuit sequence, now processing the remaining 189 ones with 2 or more circuit sequences...
Start checking equivalence with different hash...
Solver invoked 3 times to find 3 equivalences with different hash, including 0 out of 0 possible equivalences under phase shift.
189 equivalences found in 0.8926885240070987 seconds (solver invoked 189 times for 487 DAGs with 298 different hash values and 189 potential equivalences), output 298 equivalence classes.
Json saved in 0.036345964996144176 seconds.
Representative set saved in 0.005 seconds.
Before ECC simplification: num_total_dags = 487, num_equivalence_classes = 295, #transformations test = 384
test generated. Running Time (s): 1.365
Pruning Time (s): 0.023
Verification Time (s): 1.309
Num_total_dags = 70, num_equivalence_classes = 35
*** Number of transformations of test = 70
Number of xfers: 70
greedy_optimize(): Number of xfers that reduce cost: 0
greedy_optimize(): cost optimized from 2 to 2
[trivial_test] Best cost: 1.000000 candidate number: 1 after 0.000 seconds.
[trivial_test] Best cost: 1.000000 candidate number: 0 after 0.000 seconds.
input circuit:
OPENQASM 2.0;
include "qelib1.inc";
qreg q[1];
u2(pi*0.000000,pi*1.000000) q[0];
u2(pi*0.000000,pi*1.000000) q[0];
output circuit:
OPENQASM 2.0;
include "qelib1.inc";
qreg q[1];
u2(pi*1.000000,pi*1.000000) q[0];
Please let me know if you need me to also post the detailed steps for compiling and running the test code.
Thanks for pointing out the issue, @hushaohan! We have fixed this issue in the commit 5f1ef67
. Now the output of the optimization should be:
output circuit:
OPENQASM 2.0;
include "qelib1.inc";
qreg q[1];
u2(pi*0.000000,pi*1.000000) q[0];
u2(pi*0.000000,pi*1.000000) q[0];
As you can see, even if the bug is fixed, currently Quartz
cannot cancel out the two u2(pi*0.000000,pi*1.000000) q[0];
. The reason is that Quartz
only generate symbolic transformation rules without constant parameters. For example, there is a rule generated by Quartz
:
u2(p0, p1) q[0]; u2(p2, p3) q[0]; = u2(p3, p1) q[0]; u2(p2, p0) q[0];
where p0, p1, p2, p3
are symbolic parameters that can take any
real value. However, Quartz
currently does not support rules that work only when the parameters take specific constant values (e.g. pi). So currently we cannot combine two u2(pi*0.000000,pi*1.000000) q[0];
into identity.
Enabling the generator to generate transformation rules that involves constants is in our plan.
We have fixed this issue in the commit
5f1ef67
.
Is the bug caused by an incorrect application of rotation merging to unsupported gates (rotation merging does not support u2)?
We are not calling rotation merging here. The reason of this bug is that we have a pass that eliminate rotation gates whose rotation angles are equal to 2kpi
based on an assumption that they are equivalent to identity. However, this assumption is only true for rx, ry, rz, u1
. This bug is caused by mistakenly eliminating u2(0, 0) q[0];
.
I see... u2(0, 0)
not equivalent to identity is kind of counterintuitive and Qiskit already deprecated U2
. u3(0, 0, 0)
is equivalent to identity.
I do want to mention that a similar test on u1
seems to work:
#include "gen_ecc_set.h"
#include "quartz/tasograph/tasograph.h"
using namespace quartz;
int main() {
int q = 1, m = 2, n = 2;
std::vector<GateType> gate_set{GateType::add, GateType::u1};
std::string ecc_name = "test_";
std::string ecc_fpath = ecc_name + "complete_ECC_set.json";
gen_ecc_set(gate_set, ecc_name, true, true, q, m, n);
const std::string qasm = R""""(
OPENQASM 2.0;
include "qelib1.inc";
qreg q[1];
u1(1) q[0];
u1(2) q[0];
)"""";
gate_set.insert(gate_set.end(), {
GateType::input_qubit, GateType::input_param
});
Context ctx(gate_set);
auto graph = Graph::from_qasm_str(&ctx, qasm);
auto optimized_graph = graph->optimize(
&ctx, ecc_fpath, "trivial_test", /*print_message=*/ true
);
std::cout << "\ninput circuit:" << std::endl;
graph->to_qasm(true, false);
std::cout << "\noutput circuit:" << std::endl;
optimized_graph->to_qasm(true, false);
return 0;
}
gives me
...
input circuit:
OPENQASM 2.0;
include "qelib1.inc";
qreg q[1];
u1(pi*0.31831) q[0];
u1(pi*0.63662) q[0];
output circuit:
OPENQASM 2.0;
include "qelib1.inc";
qreg q[1];
u1(pi*0.95493) q[0];
fwiw
EDIT: I suppose this is the result of a pass of rotation merging, as opposed to transformation?
Actually this is the result of applying transformation. Because there is a rule that u1(p0) q[0]; u1(p1) q[0];
equals to p2 = add(p0, p1); u1(p2) q[0];
. You can view it as u1(p0) q[0]; u1(p1) q[0]; = u1(p0 + p1) q[0];
. And in our current optimize
API rotation_merging
is not invoked.
@xumingkuan u3
is added in the latest commit.
Actually this is the result of applying transformation. Because there is a rule that
u1(p0) q[0]; u1(p1) q[0];
equals top2 = add(p0, p1); u1(p2) q[0];
. You can view it asu1(p0) q[0]; u1(p1) q[0]; = u1(p0 + p1) q[0];
. And in our currentoptimize
APIrotation_merging
is not invoked.
I see, thanks for the explanation!
It would be great if this would also work for u2
or u3
, or in general any customized gates with more than 1 parameter.
It would be great if this would also work for
u2
oru3
For u2
and u3
, the equivalences are not as obvious as u1
, and I believe we do support the equivalences (except for the ones using pi/2
here for now): https://quantumcomputing.stackexchange.com/questions/13480/optimize-chains-of-single-qubit-u1-u2-u3-gates-by-combining-them-into-a-single
It would be great if this would also work for
u2
oru3
For
u2
andu3
, the equivalences are not as obvious asu1
, and I believe we do support the equivalences (except for the ones usingpi/2
here for now): https://quantumcomputing.stackexchange.com/questions/13480/optimize-chains-of-single-qubit-u1-u2-u3-gates-by-combining-them-into-a-single
Thanks for the pointer.
My end goal is actually to use quartz to optimize circuits of other gatesets (e.g. containing the PhasedX
single-qubit two-param gate, and the ZZPhase
2-qubit single-param gate, as defined at https://cqcl.github.io/tket/pytket/api/optype.html) by supplying their corresponding matrix implementations in src/quartz/gate
and src/python/verifier/gates.py
. Is it currently possible?
Judging but the snippets below,
quartz/src/quartz/generator/generator.cpp
Line 331 in 9f88caa
quartz/src/quartz/generator/generator.cpp
Lines 466 to 467 in 9f88caa
I'm assuming at least parameterized 2-qubit gates are not supported as far as my goal is concerned?
Actually this is the result of applying transformation. Because there is a rule that
u1(p0) q[0]; u1(p1) q[0];
equals top2 = add(p0, p1); u1(p2) q[0];
. You can view it asu1(p0) q[0]; u1(p1) q[0]; = u1(p0 + p1) q[0];
. And in our currentoptimize
APIrotation_merging
is not invoked.I see, thanks for the explanation!
It would be great if this would also work for
u2
oru3
, or in general any customized gates with more than 1 parameter.
Btw, are these rules automatically discovered or manually added in advance? Where/how are they stored? TIA!
I'm assuming at least parameterized 2-qubit gates are not supported as far as my goal is concerned?
It is not supported for now simply because we haven't used them in any benchmarks/experiments, and it would be easy to support them. Feel free to (open PR to) add more gates in src/quartz/gate
and we will take care of the changes in generator.cpp
.
I'm assuming at least parameterized 2-qubit gates are not supported as far as my goal is concerned?
It is not supported for now simply because we haven't used them in any benchmarks/experiments, and it would be easy to support them. Feel free to (open PR to) add more gates in
src/quartz/gate
and we will take care of the changes ingenerator.cpp
.
Sounds great, thanks!
Btw, quartz
already has cu1
and cp
, both are 2-qubit and single-param gates as well.
Actually this is the result of applying transformation. Because there is a rule that
u1(p0) q[0]; u1(p1) q[0];
equals top2 = add(p0, p1); u1(p2) q[0];
. You can view it asu1(p0) q[0]; u1(p1) q[0]; = u1(p0 + p1) q[0];
. And in our currentoptimize
APIrotation_merging
is not invoked.I see, thanks for the explanation!
It would be great if this would also work foru2
oru3
, or in general any customized gates with more than 1 parameter.Btw, are these rules automatically discovered or manually added in advance? Where/how are they stored? TIA!
These are automatically discovered by Quartz (despite probably being manually found in the Stackexchange link). They are generated and stored in the ecc_fpath
file in your code snippet. The Json file is grouped by the ECCs and inside the ECCs are circuits; there is some metadata at the beginning of each circuit and the following are the gates. For example, ["u2", ["Q0"],["Q0", "P2", "P1"]]
means taking input qubit 0, parameter 2, parameter 1, and output qubit 0. (We may change the format of the Json file as we develop.)
Btw,
quartz
already hascu1
andcp
, both are 2-qubit and single-param gates as well.
Good point! I guess no one has used them to generate an ECC set yet..
Btw,
quartz
already hascu1
andcp
, both are 2-qubit and single-param gates as well.Good point! I guess no one has used them to generate an ECC set yet..
May I ask if you guys have any recent plans on adding support for these?
Also an unrelated side question: for the graph->optimize(...)
call in my earlier snippet, is there a way to enable openmp parallelization for the heavy-lifting part? From my playing with it, seems only a single core is busy crunching numbers with the rest idle :)
Btw,
quartz
already hascu1
andcp
, both are 2-qubit and single-param gates as well.Good point! I guess no one has used them to generate an ECC set yet..
May I ask if you guys have any recent plans on adding support for these?
Sure, should be supported in my latest PR.
Also an unrelated side question: for the
graph->optimize(...)
call in my earlier snippet, is there a way to enable openmp parallelization for the heavy-lifting part? From my playing with it, seems only a single core is busy crunching numbers with the rest idle :)
It is indeed possible but non-trivial to make use of openmp. Because of the randomness of the search, what we were doing was simply running the same optimization multiple times to compute the best (or average) result, and this is naturally parallelizable.
I believe we're all busy with things like paper deadlines within the following month and we don't plan to support openmp recently. If you feel like openmp is useful, welcome to open PRs to contribute!