Bayesian techniques for Randomized Benchmarking (cont.)

Question

Bayesian techniques for Randomized Benchmarking (cont.)

ShellyGarion opened this issue 3 years ago · 11 comments

Description

Enhance the current Randomized Benchmarking experiments in Qiskit-Experimetns with advanced statistical and Bayesian techniques, based on the following papers:

Mentor/s

Shelly Garion (@ShellyGarion), Research Staff Member at IBM Research Haifa, Qiskit developer.

Type of participant

Knowledge in statistical and Bayesian methods

Number of participants

1-2

Deliverable

A pull request with these advanced methods merged into Qiskit-Experiments Randomized Benchmarking code

Answer 1 · 2021-08-22T13:33:17.000Z

@pdc-quantum @shesha-raghunathan assigned.

Answer 2 · 2021-08-22T13:35:14.000Z

Ref. qiskit-advocate/qamp-spring-21#20

Answer 3 · 2021-09-22T10:27:46.000Z

@pdc-quantum Can you please comment in the issue so that I can assign you?

Answer 4 · 2021-09-22T17:21:23.000Z

Hi @HuangJunye
Nice 😊 to be here
Please assign me

Answer 5 · 2021-09-30T21:05:06.000Z

Here is my 2-slide proposal for Mid-term review (Oct 7, 2021)
# {34} {Bayesian techniques for Randomized Benchmarking (cont.)}.pdf

Answer 6 · 2021-11-08T11:00:41.000Z

Update for Checkpoint 2

What’s new since Checkpoint 1?

Modification of the frequentist model from Qiskit experiments:
When comparing the results of the Bayesian model with those of the frequentist model in Qiskit experiments, a bug was suspected in the latter.
qiskit-community/qiskit-experiments#428
It consists of excessive bounds for the error per gate (EPG), not varying with the number of circuits for each considered length. This is now subject to a PR
qiskit-community/qiskit-experiments#472
The results presented below for the simulator example and the hardware testing were obtained with the corrected code.

Hardware testing of 24 cx gates by 2-qubit interleaved RB in six 5-qubit devices:
We used a pilot experimental protocol adapted to the provider’s limit of 75 circuits in fair share queuing at that time.
Each experiment consisted of 2-qubit interleaved RB with “lengths” set to [1, 30, 60, 100, 150] and “num_samples” set to 7 (thus 70 circuits per job).
Reference values were obtained by cubic spline interpolation from the daily hardware calibrations. For only 7 out of the 24 tested gates, the calibration drift was smooth in the four-day period centered on each experiment (group 1). For the remaining 17 gates there was an oscillation (group 2).
The correlation between the reference values and the RB calculated EPG was studied.
For respectively the frequentist and the Bayesian model, r squared was:

.849 and .815 in all 24 gates,
.976 and .981 in group 1
.730 and .640 in group 2.

In the total sample, the relative error on EPG was lower with the Bayesian model than with the frequentist model (mean ± sd: 7.6% ± 0.02% vs. 10.2% ± 0.19%, p<.001).
Bayesian RB appears to be a reliable method providing slightly narrower error bounds than RB based on least squared fit. A credible interval is also inferred for EPG, with its own useful upper bound.

Check_point_2_^LN {34} {Bayesian techniques for Randomized Benchmarking (cont.)}.pdf

Visual for the project

Visual^LN {34} {Bayesian techniques for Randomized Benchmarking (cont.)}.pptx
Visual^LLN {34} {Bayesian techniques for Randomized Benchmarking (cont.)}.pdf

$Visualx2^LLN {34} {Bayesian techniques for Randomized Benchmarking (cont )}$

Answer 7 · 2021-12-08T20:55:38.000Z

Slides for Final Showcase
Dec 9 N {34} {Bayesian techniques for Randomized Benchmarking (cont.)}.pdf
Dec 9 N {34} {Bayesian techniques for Randomized Benchmarking (cont.)}.pptx

Answer 8 · 2021-12-19T10:19:34.000Z

Final Showcase (1/4)

This text refers to the presentation slide here.

Qiskit experiments gave more opportunity for hardware testing of the Bayesian statistical model (BSM) and comparison with the frequentist statistical model (FSM), thanks to:

The multi_curve_fit function allowing completion of a frequentist interleaved RB protocol in a single run, as it is the case for the Bayesian counterpart.
The automatic management of fair share queuing by Qiskit experiments, allowing hardware submission of RB experiments divided into up to five jobs, each comprising a maximum of one hundred circuits.
The successful merging of PR#472, leveling the playing field for the error bounds estimations.

One and two-qubit standard and interleaved RB could then be tested and compared for the two models.

Answer 9 · 2021-12-19T10:19:43.000Z

Final Showcase (2/4)

Hardware Demonstration of a Bayesian Advantage in RB:

One-qubit interleaved RB (slide 3):

All 15 qubits from three 5-qubits devices

Protocol:

Sixteen equally spaced sequence lengths
Maximal sequence length: 1500 Cliffords
Fifteen sampled RB sequences
Five jobs on fair share queuing for a total of 480 circuits

Results:

Correlation between Pauli-X EPG by FSM and BSM: R² = .9964
Correlation between Pauli-X EPG by BSM and:
- reference (linear interpolation): R² =.479
- reference (cubic spline interpolation): R² =.576
Relative error on EPG: 3.97% by BSM vs. 5.45% by FSM, p<0.01

Two-qubit interleaved RB (slide 4):

All 16 cNOT from four 5-qubits devices with EPG by calibration less than .02

Protocol:

Sixteen equally spaced sequence lengths
Maximal sequence length: 253 Cliffords
(199 if EPG from calibration > 1.5%)
Twenty sampled RB sequences
Four jobs on fair share queuing for a total of 400 circuits

Results:

Correlation between cNOT EPG by FSM and BSM: R² = .991
Correlation between cNOT EPG by BSM and:
- reference (linear interpolation): R² =.841
- reference (cubic spline interpolation): R² =.775
Relative error on EPG: 4.47% by BSM vs. 6.29% by FSM, p<0.001

Answer 10 · 2021-12-19T10:49:04.000Z

Final Showcase (3/4)

Hardware comparison between standard mode and interleaved mode

Comparison between standard mode and interleaved mode for one-qubit gate:

3 qubits ( [0] ) from three devices with Pauli-X EPG by calibration less than .00025.

Protocol: designed to require nearly equal quantum computational resources

Ten equally spaced sequence lengths
Maximal sequence length: 1500 Cliffords
Twenty sampled RB sequences for standard mode vs. two times ten sequences for interleaved mode
Four jobs on fair share queuing (two for standard mode and two for interleaved mode) for a total of 400 circuits (200 per mode)

Results :

The relative error on EPC is reported in the following table. This value was of the same order of magnitude for both standard and interleaved modes and for both statistical models. No systematic advantage of BSM in narrowing error bounds was observed in this small series.

Table 1: Relative error on EPC for qubit [0] of three hardware backends

Comparison between standard and interleaved mode for two-qubit gate:

3 pairs of qubits on ibmq_lima with cNOT EPG by calibration less than .015.

Protocol: designed to require nearly equal quantum computational resources

Ten equally spaced sequence lengths
Maximal sequence length: 1603 Cliffords for one-qubit RB and 253 Cliffords for two-qubit RB
Ten sampled RB sequences for all experiments
Five jobs on fair share queuing (two one-qubit and one two-qubit for standard mode, two for interleaved mode) for a total of 500 circuits

Results :

The relative error on EPC is reported in the following table. This value was of the same order of magnitude for both standard and interleaved modes and for both statistical models. BSM reported in all cases a lower relative error than FSM in this small series.

Table 2: Relative error on EPC for qubit cx gates ibmq_lima

Answer 11 · 2021-12-19T10:54:47.000Z

Final Showcase (4/4)

Additional observations (slide 5):

Thanks to multiple chains in the MCMC algorithm, BSM can reveal accessory minima missed by FSM by displaying a double-peak posterior distribution.
References value must be considered with caution because significant fluctuations between two daily hardware calibrations are not excluded.
Fair share queuing may involve long delay between jobs. Hence the importance to submit all the jobs of an experiment in the shorter possible time and to avoid experimenting in case of long queuing file. This explains the low number of tested gates in our comparisons of RB modes.

Conclusions:

RB Bayesian methods can be applied at the sole cost of additional classical computational resources.

In the present hardware experimentation, they often allowed to narrow the error bounds of the RB estimations. Furthermore, obtaining a credible lower limit for gate error is a precious inherent feature of the Bayesian approach.

More insight in the model properties is provided through the visualization of the posterior distributions of the parameters. When giving similar results as by FSM, these methods can be considered as independent sanity check of the frequentist models routinely used. In addition, the MCMC algorithm allows detection of possible accessory minima affecting the fitting process.