tomerfiliba-org/reedsolomon

Question about Performance

FranzForstmayr opened this issue · 2 comments

From your Readme:

The codec has quite reasonable performances if you either use PyPy on the pure-python implementation (reedsolo.py) or either if you compile the Cython extension creedsolo.pyx (which is about 2x faster than PyPy). You can expect encoding rates of several MB/s.

Is this still valid?
I just did a performance evaluation for three different python versions (3.7, 3.8, 3.9) with the following code.

import sys
import numpy as np
from reedsolo import RSCodec
import perfplot

name = f'perf_v{sys.version_info.major}.{sys.version_info.minor}'

def func(rscoder, array):
    enc = rscoder.encode(array)
    return rscoder.decode(enc)[0]

codecs = [
    RSCodec(8),
    RSCodec(16),
]

out = perfplot.bench(
    setup = lambda n: np.random.randint(0,255,size=n, dtype=np.uint8),
    kernels = [
        lambda a: func(codec, a) for codec in codecs
    ],
    labels = [codec.nsym for codec in codecs],
    n_range = [2 ** k for k in range(20)],
)
out.show()
out.save(name + ".png", transparent=True, bbox_inches="tight")

I get a maximum of 100kB/s (for encoding and decoding together).

Is this an expected speed? Tested on Ubuntu 20.04, cython is installed.

Here is the output of the three perfplots.

perf_v3 7
perf_v3 8
perf_v3 9

I expected the cythonized function to be faster. However the number of ecc symobls seems to be not relevant here.

PS:
To reproduce with python3.7, you'll have to install perfplot==0.9.6

Follow-up on this: thank you very much @FranzForstmayr for your code snippet, I have reworked it a bit to add the Cythonized extension and it is now merged in tests/perf.py. However, as you pointed out, perfplot does not show any different performance between the cythonized extension and pure python? This is very strange, and I don't know the reason why.

Anyway, I have recently reworked the cythonized extension, and I can confirm that it now runs even much faster than before, at 12.5 MB/s encoding on my 5 years old laptop. I tested the performance with another tool I made here, which is the one I used in the past to derive the speed results I cited above and in the README, so the speed is against a comparable basis (although old tests were done under Python 2.7, and nowadays under Python 3.10).

I did not yet test the results with PyPy, but I also expect an improvement since I have merged some optimizations I did with Cython into the pure python version too (such as pre-allocating byterrays).

I will now close this issue as the cythonized extension provides for sure > 10 MB/s of encoding speed, and more improvements are under way. Please let me know if you still encounter any issue.