The output is of the shortest length is a question mark.
Closed this issue · 3 comments
I generated decimal normal number by concatenate to integers together to this effect:
dbl = operand_fractpart / std::pow(10, n); // operand_fractpart is an integer
dbl += operand_intpart; // operand_intpart is also an integer
Then passes a double to DragonDecimal.cpp as attached. I passed 10000 doubles and toStringRaw() output is also attached.
fp_by_dragonbox_decimal_raw8.txt
My code makes sure operand_fractpart is only 8 digits. However we can see -7.018380128700001E2. By right it should be -7.0183801287E2?
However if I tried to verify whether -701.83801287 is a normal number by:
double d = -701.83801287;
memset(buffer, 0x0, sizeof(buffer));
jkj::dragonbox::to_chars(d, buffer);
std::cout << buffer << std::endl;
Then it's ok. I feel a literal double may be different from a dynamic memory double?
Every my 10000 double, there are always a couple of non-shortest values.
mpf_float_50 can print my 10000 number correctly if I call its function str(12). -701.83801287 is -701.83801287.
It's still you are guessing the answer wrongly.
dbl = operand_fractpart / std::pow(10, n); // operand_fractpart is an integer
This line is probably fine. Assuming operand_fractpart = 83801287
, the resulting dbl
will of course not be exactly 0.83801287
, but 0.83801287
is surely the shortest & correctly rounded decimal output of what's stored in dbl
, assuming that the hardware and the compiler are exactly implementing IEEE-754 binary64. In fact, the exact value of dbl
is supposed to be 0.83801287000000002080923877656459808349609375
.
However, a rounding that you seem to be missing is happening here:
dbl += operand_intpart; // operand_intpart is also an integer
Assuming operand_intpart = 701
, what do you think will happen if you add these two numbers? In order to add two floating-point numbers with different exponent, you first have to match the exponent by shifting the mantissa bits, and then add the aligned mantissa bits. In this example, the exponent for 701
is 9
(because dbl = 0.83801287000000002080923877656459808349609375
is -1
(because dbl
). Since the difference is 10, your CPU shifts dbl
's mantissa bits to right by 10-bits, and then add it to the mantissa bits of 701
. Now, what happens is that among these 10-bits that are thrown away, only the MSB is 1 and all other bits are zero, which means that the result of addition is off by exactly 0.5. Since the last bit of this result of addition is 1, the "round-to-nearest, tie-to-even" rule kicks in and the final answer your CPU spits out is this addition result plus 1.
Before this plus 1, the addition result evaluates to 701.8380128699999431773903779685497283935546875
, but after it the result becomes 701.8380128700000568642280995845794677734375
. And the shortest & correctly rounded output for this second one is 701.8380128700001
, which coincides with what Dragonbox outputs.
As a comparison, if you do this:
double d = 701.83801287;
Then d
is initialized with 701.8380128699999431773903779685497283935546875
because that's the closest number to 701.83801287
that is representable within IEEE-754 binary64. And in this case of course the shortest & correctly rounded output is just 701.83801287
.
If you're still not sure what's happening, just try to do this:
double d1 = 701.83801287;
double d2 = 701 + (83801287 / std::pow(10,8));
std::cout << (d1 == d2);
And it prints 0
because d1
and d2
are not same.
As you can see here, it is quite hard to randomly generate floating-point numbers with a prescribed number of decimal digits. Your attempts are just too naive. When I was testing Dragonbox, I used Ryu to accomplish that (now it's replaced by Dragonbox since it's only used for benchmarking now, not testing). And I don't think it's very likely that there actually are much simpler no-brainer ways of doing so. Please be wary of that almost all floating-point operations involve rounding.
It's my bad and I simplified
dbl = operand_fractpart / std::pow(10, n); // operand_fractpart is an integer
dbl += operand_intpart; // operand_intpart is also an integer
to be just dbl = operand_fractpart / std::pow(10, n); // operand_fractpart is an integer
By this way, to make sure it's a solid decimal-->bin64-->decimal trip and now seems ok.
of course double + double is not decimal-->double-->decimal trip any more.