couchbase/fleece

Decimal precision can be lost converting double -> Fleece -> string

Closed this issue · 1 comments

snej commented

Original issue: cbl-dart/cbl-dart#557

Some 64-bit double values, when encoded to Fleece and then encoded to JSON, will lose precision. For example, 307.79998779296875 will end up as "307.8".

This number, despite its long decimal form, has an exact representation as a 32-bit float. Fleece's encoder sees this and saves 4 bytes by encoding it as a float instead, losslessly. The trouble begins when the JSON encoder converts the value to a string -- it sees it's a float and formats it as one, which means it rounds it to 6 digits of precision, resulting in "307.8".

This is correct behavior for a float value; the problem is that the string encoder doesn't know that the value began as a double and should be printed with 15 digits of precision.

snej commented

I didn't want to take out this space optimization, since a lot of doubles in documents do convert nicely to 32-bit (any integer up to ±16 million or so, and a lot of values with simple fractions like .5, .25, .125, etc.) The problem is the numbers that have long decimal expansions but are still 32-bit compatible. But I don't know a cheap way to tell those numbers apart.

What I settled on was defining an unused bit in the Fleece encoding of a floating-point number; if this bit is set the value represents a 64-bit double even though it's encoded as a 32-bit float. In other words, the isDouble() method will return true. This makes the 64-to-32-bit compression totally hidden in the API. Significantly, the JSON encoder calls isDouble, so it will properly format the problematic numbers with the correct precision.

A similar thing happens in the internal Slot class which represents a mutable value. When it stores a double as a float it sets the same flag.

It's been a long time since the binary format was changed, but this change is safe and backwards-compatible. The encoder used to always write that bit as 0, and decoders ignored it.