HdrHistogram/HdrHistogram_py

Histograms are encoded with the wrong endianness on PyPy

Julian opened this issue · 7 comments

Moving this issue from HdrHistogram/HdrHistogramJS#2

(I haven't quite located where the bug is here yet, or if it belongs upstream but it definitely doesn't seem to belong where I first left it).

virtualenv --python pypy hdrh && hdrh/bin/pip install hdrhistogram && hdrh/bin/python -c 'from hdrh.histogram import HdrHistogram; print HdrHistogram(1, 2, 3).encode()'; rm -rf hdrh

produces a histogram which is encoded using the wrong endianness (and therefore cannot be decoded with other HDR histogram tools.)

#12 is tangentially related.

OK, I'm not 100% sure yet given how many things here are outside my wheelhouse, but I'm guessing what's going wrong here is in https://github.com/HdrHistogram/HdrHistogram_py/blob/master/hdrh/codec.py#L231

That appears to memmove a thing into a thing, but self.payload isn't some memory, it's just a regular ol' Python object, so what things live at its memory address is implementation specific, I'm surprised that doesn't even blow up more spectacularly, I guess PyPy's CPython API emulation must have something resembling the right bytes at that memory, but with a different endianness.

I think the right fix though must involve not actually poking at objects' internal memory but exposing the bytes that are needed in a more portable way?

(Sorry, more train-of-thought debugging: that's not in fact a random python object, it's a ctypes.BigEndianStructure -- that sounds like it'd change things quite a bit. /me keeps digging)

OK, it looks like the upstream bug is https://bitbucket.org/pypy/pypy/issues/1213/inheriting-from-ctypesbigendianstructure which boils down to "PyPy only supports native byte order."

I'm not sure whether anything's changed recently in PyPy to make changing that easier, but I suspect that unless something has, the right upstream solution is to actually just plain make this fail harder (by removing BigEndianStructure entirely), so that doesn't help here I think. I'm not sure what a good solution is beyond #12 but will keep prodding.

Closing this issue as it is tracked by #12
Added limitation to the README.

It's actually been fixed upstream! Should be working in PyPy 5.9 I believe (though I haven't verified it quite yet).

Awesome! Perhaps we should close #12 then?

I think that would still be nice to do independently -- it will be faster (and simpler) on PyPy. Or a CFFI version, which should be fast on both CPython and PyPy.

But given that I don't think I'm gonna have a chance to get back to this anytime soon you might want to close it anyhow :D