Different runtime of stim.Tableau.prepend and stim.Tableau.append

Question

Different runtime of stim.Tableau.prepend and stim.Tableau.append

Closed this issue 8 months ago · 2 comments

Hi! I noticed that the method stim.Tableau.prepend is much faster than stim.Tableau.append. This is especially evident when prepending (appending) a small tableau to a large one. From my understanding, the two methods should have similar runtime, though. Here is an example run on a Jupyter notebook:

N = 400
tab1 = stim.Tableau.random(N)
tab2 = stim.Tableau.random(2)
%timeit tab1.prepend(tab2,[0,1]) # returns 1.79 µs ± 25.6 ns per loop
%timeit tab1.append(tab2,[0,1]) # returns 343 µs ± 3.46 µs per loop

Is it supposed to be like this? Thanks in advance!

Answer 1 · 2023-12-08T16:44:47.000Z

In memory, the contents of columns are contiguous in memory and aligned so that SIMD instructions can be used to operate on hundreds of bits at a time. That means rows aren't, so row ops have to work one Pauli at a time. So the 100x difference in speed that you're measuring is actually kinda consistent with the internal details.

If you need to do a lot of appends in sequence, you can invert the tableaus and then do prepends instead of appends. If you don't care about the sign of the result you can use stim.Tableau.inverse(unsigned=True) which is O(n^2) instead of O(n^3).

Internally in the C++ code, there is a "temporarily transposed tableau" class where row ops are fast as part of doing measurements. It's also likely possible to optimize the append methods (e.g. extract the relevant bits once, go fast, then put bits back).

Answer 2 · 2023-12-09T00:04:06.000Z

I understand, thanks for the quick reply! I will try to see if the method you suggest improves performance.