JSON compliant to_chars?
Closed this issue · 8 comments
I'd like to use dragonbox for serializing floats in Glaze. What I'm particularly looking for is a JSON compliant to_chars
implementation that has the shortest string result. It seems like dragonbox's to_chars
will print out ...E0
when it is not necessary.
The other requirement for JSON compliance is that infinity and nan should be printed as "null"
.
Could options to support these features be added to dragonbox or should I fork the code and edit it myself?
I'd also like the to_chars
to be header only, but I don't mind doing this conversion myself and keeping it up to date with dragonbox
.
Thanks for all the hard work you've put into this great library!
Also, a number like 10.1 right now prints as 1.01E1, which adds 2 additional characters to the output, and it is harder for users to read.
I'm curious if you know what the performance impact of supporting this cleaner output would be?
Hi, thanks a lot for your interest!
Since this is a duplicate of #35 I'm closing this, but please feel free to discuss further in this thread (or in #35) if you want.
To answer your specific questions:
It seems like dragonbox's
to_chars
will print out...E0
when it is not necessary.
Also, a number like 10.1 right now prints as 1.01E1
You're right, please read my answer in #35.
Could options to support these features be added to dragonbox or should I fork the code and edit it myself?
Maybe in the future, but not planned at this moment. PR is welcome though!
Ideally, a hypothetical to_chars
should do what std::to_chars
do (except for the bound-checking API which I dislike) , i.e., to print a string with the least number of characters. But there is a nontrivial issue here: there are cases when fixed-point notation yields smaller number of characters, but the number of digits that will appear in the integer part of the fixed-point output exceeds the number of digits in the decimal significand that Dragonbox finds out. The simplest solution is to fill out missing digits with 0
, which retains the roundtrip gurantee and the shortness guarantee, but it breaks the correct rounding guarantee. So for that case we must do something else, and I did not yet really thought carefully about what could be a possible solution. Please see fmtlib/fmt#3649 for more info.
I'm curious if you know what the performance impact of supporting this cleaner output would be?
I'm not sure. I will not be surprised either way. I think I saw supporting evidences in both ways.
I'd also like the
to_chars
to be header only,
To be absolutely honest, I really want to move away from header-only library 😃 constexpr
is the only reason why I keep to_decimal
header-only.
Thanks for your helpful reply. I will probably develop a to_chars
implementation in Glaze using dragonbox::to_decimal
under the hood. This can be an incubator for a potential pull request in the future. I just wanted to make sure I wasn't duplicating efforts.
I appreciate the prompt reply!
Thank you so much!
Using modified code from yyjson and Glaze I was able to incorporate dragonbox and serialize in the desired JSON format.
Here's the current implementation: https://github.com/stephenberry/glaze/blob/main/include/glaze/util/dtoa.hpp
I'm sure the performance can be improved, but it is quite efficient in its current form. Thanks for your library, it made this very straightforward!
Here's the current implementation:
Didn't check all the details but looks pretty good to me. Just have a couple of comments:
As far as I can see this function does not remove trailing zeros, but is named as _trim
. I initially supposed it's named as such because it removes them. I guess maybe it's better to call it _no_trim
to contrast it to the one above it, unless there are other reasons to call it as _trim
?
For IEEE-754 binary32, the exponent is at most of two decimal digits so you don't need this branching.
Your implementation does have the correct rounding issue I mentioned, but of course it is not necessarily a real issue depending on your goal. IIRC the JSON spec doesn't really mandate correct rounding, just roundtrip guarantee, but not sure if I'm remembering correctly. In any case I think it's good to be aware of this.
Thanks for your library, it made this very straightforward!
Really glad my work is being integrated into such a great library!
Also I love those small branchless tricks. Reminds me that I'm not really good at micro-optimizations lol
As far as I can see this function does not remove trailing zeros, but is named as _trim.
Yeah, I had at first intended to trim and then realized it was simpler to use dragonbox's trimming. I updated the name.
For IEEE-754 binary32, the exponent is at most of two decimal digits...
Thanks for this observation.
IIRC the JSON spec doesn't really mandate correct rounding...
Round-tripping and shortness are the primary goals. So, I also think this is okay.
Also, I had to make a fix for denormalized float writing as JSON requires a 0 before the exponent. E.G. -1.E-47 must be written as "-1.0E-47"
IIRC the JSON spec doesn't really mandate correct rounding, just roundtrip guarantee, but not sure if I'm remembering correctly.
So I looked it up, and it seems the spec is very blurry in this respect, and basically allows the implementations to do whatever they want. So.... whatever.