hjson/hjson-cpp

how to separate int values like `123` from floating point values like `123.0`

bstarynk opened this issue · 18 comments

How should I separate values decoded from 123 (as an int64_t) from values decoded from 123.0 (as a double)?

I miss a bool Hjson::Value::is_int64() const member function.

(This is for http://refpersys.org/ BTW)

Regards

--
Basile Starynkevitch, Bourg La Reine, France
http://starynkevitch.net/Basile/
basile@starynkevitch.net

https://github.com/bstarynk/hjson-cpp is a fix which I have made as a pull request #23

In what cases would you need to separate 123 from 123.0? The specification of JSON/Hjson only deals with numbers, it doesn't treat integers differently from floats. So I am reluctant to add something to the implementation of hjson-cpp that isn't specified in the format.

In what cases would you need to separate 123 from 123.0?

In http://refpersys.org/

Think of it (today in 2019) as some persistent Lisp like programming language (semantically, close to Guile, syntactically, very different). The entire heap is persisted as several HJson files. Our integers are actually 63 bits (not 64 bits) tagged integers, and our doubles are actually heap-allocated boxed IEEE 754 doubles.

That is why we need to separate an integer 123 from a floating point 123.0

The same difference is true for JSON, but the C++ JSON library JsonCpp does make the difference. I consider that library as a good one, and as relevant prior art to that issue.

And in C++17 (see n4640 in practice), double and int64_t are handed very differently (AFAIK, C++ has no standard single number type), and the x86-64 ABI conventions are using different calling conventions to pass them.

See https://en.wikipedia.org/wiki/X86_calling_conventions and of course https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI more specifically https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-1.0.pdf notably its §3.2.3

Indeed, the abstract specification of both JSON and HJSON is only speaking of numbers, but at the C++ implementation level 64 bits integers and 64 bits IEEE 754 floats are very different beasts.

I'm aware of how C++ handles int64_t and double, but 123 can be represented without precision loss either way. How the value is stored internally in hjson-cpp is an implementation detail. But if I add a function is_int64() that would become public instead, I'm reluctant to do that, I'll have to think about it.

But the number 1152921504606846976 (that is 2 power 60) cannot be represented exactly as a IEEE754 double float (the mantissa there is 53 bits, so it won't be different from 1152921504606846975 ...). In RefPerSys, we need to make it different from a 1152921504606846976.0 floating point number

You could also notice that PostGreSQL and sqlite are handling 64 bits integers and doubles differently, so they make another prior art. And QVariant might also inspire you.

Of course, you might have some INT64 value in Hjson::Value::Type, but for some reason you did like that. My opinion is that it would have been preferable.

If you really wanted numbers to be the same, you would have used std::variant<std::int64_t,double> as your number type in C++ (but that could become a performance disaster, so I hope you would avoid that choice).

Yes that's why I added int64_t internally, but the issue here is something else. You can already check if the number represented by an Hjson::Value can be represented by int64_t or double for example by using the function modf.

But what you are asking for here is that information should be saved in the Hjson::Value about whether it was created from an integer or from a float, no matter if the value can be represented both ways.

You can already check if the number represented by an Hjson::Value can be represented by int64_t or double

Please explain for 1152921504606846976 vs 1152921504606846975.0 noticing that mathematically they are not the same value (differing by 1.0)

I didn't add INT64 as a type because then a lot of code using Hjson would need to check if the value is of type DOUBLE or INT64 instead of just checking if it is the type DOUBLE. That could be confusing since JSON/Hjson only has a number type, not an integer or float type.

Hence my patch (and pull request #23). It does not break your code. RefPerSys absolutely needs to separate integers from doubles.

You can already check if the number represented by an Hjson::Value can be represented by int64_t or double

Please explain for 1152921504606846976 vs 1152921504606846975.0 noticing that mathematically they are not the same value (differing by 1.0)

For values in the range 1152921504606846976 to 9223372036854775807 ther would be no reason to store them as double since the decmials would not be stored for them, right?

This is an implementation detail, and not an answer to my simple question. How can I separate 1152921504606846976 vs 1152921504606846975.0. I really need to have them being different.

By accessing them with the function to_int64() from the Hjson::Value.

I think they would give the same int64_t  value. What make you believe they won't?

see http://floating-point-gui.de/ for more. Also Fluctuat and Pequan and Cadna and this paper by my colleague Franck Védrine.

Beware that floating point is a very tricky issue, I don't understand it a lot.

See also Frama-Clang and Clang static analyzer. Both are open source tools working on Linux, capable of analyzing some C++ programs.

True if the value was unmarshalled from an Hjson string 1152921504606846975.0, but that's because all strings containing a dot will be stored as double internally in Hjson. Your patch does not change that. I could add a special check for trailing .0 when parsing the number in the unmarshal function.

that's because all strings containing a dot will be stored as double internally in Hjson.

Which is of course the behavior I am expecting (and similar to the behavior of JsonCpp). RefPerSys needs to handle double and integers differently

I could add a special check for trailing .0 when parsing the number in the unmarshal function.

Please don't.

that's because all strings containing a dot will be stored as double internally in Hjson.

Which is of course the behavior I am expecting (and similar to the behavior of JsonCpp). RefPerSys needs to handle double and integers differently

And that was my point about what you really are asking for: You are not asking for a way to know if it would be best to treat the value as a double or as an int64_t, you are asking for a way to know if .0 was present in the Hjson string. And that's something I'm reluctant to do and will have to think about.

No, I am asking a behavior similar to JsonCpp one: separate integers from floating points. Simply because in C++ they really are different, and in C or C++ or Ocaml code 1.0 is never the same as 1

Conceptually the Hjson number abstract data type is a tagged union (or sum type). At some point you need to deal with that fact, since C++ does not have a number type equivalent to what Hjson (or idealized Javascript, or JSON) wants. My opinion is that JsonCpp could be inspirational.

This issue is now mentioned on http://refpersys.org/