QuadPrec scalar repr isn't ideal for display in arrays
Opened this issue · 16 comments
I think it'd be a lot nicer if we used the string form instead of the repr form when formatting arrays:
>>> np.array([str(pi), str(pi+1)], dtype=npq.QuadPrecDType())
array([QuadPrecision('3.1415926535897932384626433832795e+000', backend='sleef'),
QuadPrecision('4.1415926535897932384626433832795e+000', backend='sleef')],
dtype=QuadPrecDType(backend='sleef'))
The current output repeats backend='sleef' three times and uses the unnecessarily verbose format with QuadPrecision showing up explicitly.
This is happening because of this choice:
>>> str(npq.QuadPrecision(str(pi)))
'3.1415926535897932384626433832795 '
>>> repr(npq.QuadPrecision(str(pi)))
"QuadPrecision('3.1415926535897932384626433832795e+000', backend='sleef')"
IMO it would be better if the array repr used the string form instead of the repr format.
Although that said, I'm not totally sure why there's trailing whitespace in the string format. That might be another bug.
+1 agreed! When debugging I often have to cast back to float64 just so that the output is (more) readable
@ngoldbaum Is there a specific function that numpy calls when formatting elements inside of an array, or does it use the scalar object repr?
Is there a specific function that numpy calls when formatting elements inside of an array, or does it use the scalar object repr?
It just uses the scalar object repr. This is actually all implemented in python. The function that chooses the scalar formatter is here:
And I think we're not getting a nice formatter because the quad precision scalar doesn't subtype np.floating, so we don't get FloatFormatter.
That said, it might be nice to have a user dtype plugin slot to add an entry to that dict. Maybe @seberg has an opinion too.
Yeah, we should add a way to format elements beyond the scalar repr, since that might include silly things (and doesn't have a precision context either clearly).
We need to decide on how mostly.
(Format a single value, or many? what to pass in? maybe even do it in Python for simplicity -- at least for now, a default C-slot could always call the same Python function. Or maybe just add/register a Formatter?).
Anyway, happy to add something and also happy about a suggestion, the simpler the better :).
My simplest idea would be to print the new custom dtypes using their str instead of their repr. So the numpy code would just get one more if branch that would need to check if dtype uses the new dtype API.
So the numpy code would just get one more if branch that would need to check if dtype uses the new dtype API.
Honestly, this is probably a better default than what we use now. I'd be for making this change in NumPy.
That said, I think because of NumPy's stability guarantees we'd still need a way to let people select the old repr choice as well. At that point we might as well make it arbitrarily configurable...
That said, I think because of NumPy's stability guarantees we'd still need a way to let people select the old repr choice as well.
Is there any dtype except stringdtype really (w.r.t. changing the default)? But it would be nice to have the option iether way, I agree.
Is there any dtype except stringdtype really (w.r.t. changing the default)?
Not that I know of. And StringDType already has a special code path for the formatter so it wouldn't be effected. So maybe it is OK. I just worry a little about anyone who has written a user dtype and not publicized it and it technically is stable...
But I guess they're probably annoyed by this choice too!
Anyway @juntyr if you're interested, we'd appreciate your help on this with a NumPy PR.
I suppose I have a bit opinion that if we can talk to everyone affected in person in theory, it's probably OK.
Although that said, I'm not totally sure why there's trailing whitespace in the string format. That might be another bug.
Again sorry guys, it seems I didn't recieve the notifications for new issues
@ngoldbaum thanks for catching this, it is actually coming from dragon4.c logic of adding padding if the fractional part is less than the SLEEF_QUAD_DIG. PR #133 resolves this
Testing from the code of PR #165
In [4]: np.array([str(pi), str(pi+1)], dtype=npq.QuadPrecDType())
Out[4]: array([3.14159265, 4.14159265], dtype=QuadPrecDType(backend='sleef'))@ngoldbaum now it seems better
Testing from the code of PR #165
In [4]: np.array([str(pi), str(pi+1)], dtype=npq.QuadPrecDType())
Out[4]: array([3.14159265, 4.14159265], dtype=QuadPrecDType(backend='sleef'))
@ngoldbaum now it seems better
Should we keep this issue open as #165 seems to give good formatting
@SwayamInSync Does the new system allow formatting in full precision? If so, I'd propose to close this issue
There is a default precision value set when printing array so very small numbers might not be default visible to their accurate precision, one can manipulate the options using set_printoptions (we do this inside the numpy testing)
So I think this is working as expected
Do we already have a test that checks that set_printoptions works with the quad dtype?
in the new issue #194 I pointed out how it worked, we don't have any test for this as its a hack to make the printing work for extreme quad precision cases. If playing normal number range then default is fine.
Also suppose someone don't want to set them globally then we can also use the context manager
with np.printoptions(...)
# quad precision print code