Decimal floating point
dumblob opened this issue ยท 11 comments
It seems that C23 will contain native support for decimal floating point number types. I think new languages (like Passerine) should also provide native support for decimal floating point as a much safer option instead of binary floating point (there are many important reasons to this and only one viable reason contra - which is performance which makes the computation about 2x as long on average with emulated decimal floating point).
It should preferably become the new default floating point type (with that I simply mean that floating point literals should be treated as decimal floating point literals if not explicitly casted otherwise).
In addition to supporting decimal floating point I'd like to point out, that Passerine shall report all lossy conversion from a floating point literal to a binary floating point number in compile time. Otherwise Passerine would be unsafe already in the language specification itself ๐ฎ.
Keeping the current floating point numbers would help with passerine's compatibility with other languages. It is the option with minimal controversy and with the most library support already. I would suggest passerine keep f64
/double
backed floats.
Reporting loss would be a neat thing for an upcoming effect system. I would love to see an effect for "loss of precision". It is not unsafe to loose precision, especially for simulations and the like, but its nice to know when it happens.
Also, cannot find any information for C23 including decimal floating point. Could you link to where you are getting this, I am interested in reading about it.
Keeping the current floating point numbers would help with passerine's compatibility with other languages. It is the option with minimal controversy and with the most library support already. I would suggest passerine keep
f64
/double
backed floats.
It's not either-or - in practice one needs both. Binary floating point for compatibility and maybe some performance computations and decimal floats everywhere else. Basically the only change I'm proposing here is to treat float literals as decimal float literals. This has all the advantages:
- decimal floats shouldn't need any effects unlike binary floats (thus it's also syntactically terser)
- it's forward compatible - especially important to casting (unlike binary float literals which are nearly in all cases lossy and thus converting them to decimal floats doesn't make sense because you've already lost some information)
- its widely-used cross-platform emulation library mpdecimal is incredibly fast (only about 0.5x the speed of HW binary floating point)
- it's well defined and works everywhere the same (unlike binary floating point - refer to "catches all cases" link below)
- decimal floats behave intuitively (unlike binary floating point - refer to "catches all cases" link below)
- it doesn't interfere in any way with binary floats - it's just a coexisting "overall better float" - it simply makes floats much more human-friendly and at the same time if one needs double the speed at the expense of precision loss, one can easily cast it any time (incl. literals in compile-time) to binary float
Reporting loss would be a neat thing for an upcoming effect system. I would love to see an effect for "loss of precision". It is not unsafe to loose precision, especially for simulations and the like, but its nice to know when it happens.
Generally binary floats are (very) unsafe as a first-class language construct because it's impossible to define it so that it catches all cases. The takeaway: it's by far not only about loss of precision.
Also, cannot find any information for C23 including decimal floating point. Could you link to where you are getting this, I am interested in reading about it.
https://en.cppreference.com/w/c/experimental ("Merged into C23")
Btw. I can't imagine how ridiculous it would be if a new ambitious language like Passerine would not have first-class support of equivalents of low-level data types the stubborn old C does.
(I have a bad habit of writing complete answers yet forgetting to hit send. Here goes again)
I think having core support for decimal floating points is a great idea - however, I'm not so certain about it being the default behavior.
Decimal floating point is nice, because it makes weird behavior of floating-point in general more closely align with our intuition for base 10 - for instance, 0.2 can be represented exactly, but 0.33... can not. Compare that to binary floating point, where 0.2 can be represented as exactly as 0.33... .
Before I get into my objections, let me clear a few things up.
Reporting loss would be a neat thing for an upcoming effect system. I would love to see an effect for "loss of precision". It is not unsafe to loose precision, especially for simulations and the like, but its nice to know when it happens.
At first, I wasn't sure if this was possible or advisable, as effects are normally reserved for things like concurrency. After closer inspection, it seems like reporting loss at compile via effects triggered by implicit macros might be possible. This would not have an additional runtime overhead. I don't think it's a good idea, though: at the very least, it would have to be an optional feature, or something only visible in verbose mode.
Generally binary floats are (very) unsafe as a first-class language construct because it's impossible to define it so that it catches all cases.
I view floating-point numbers as approximate representations of numbers meant for speed over accuracy. This is why, for instance, it's common to pull in arbitrary-precision decimal types or to just use integers when working in areas that require a high level of precision, i.e. finance and cryptography.
I'm worried that having decimal floating point numbers by default will introduce a false sense of security. Although the failure modes are 'better' than those of binary floating point, pitfalls still exist. If we were use decimal fp by default, it and its pitfalls would have to be very carefully implemented, documented, and communicated to users - and that burden would fall on us. I see decimal floating point as a compromise.
This doesn't mean I'm against it, I'd love to have it be a core language feature! My only concerns are over defaults.
I can't imagine how ridiculous it would be if a new ambitious language like Passerine would not have first-class support of equivalents of low-level data types the stubborn old C does.
As we're expanding Passerine's type tree, it's important to take prior art into account. I think there is more nuance, there. C does a lot of things right, but it also does a lot of things in a way... that leaves much to be desired (see, e.g. low-level untagged unions).
Here's how I propose we implement decimal floating point in Passerine:
- Make the lexer keep the decimal form of numbers while lexing, so there is no implicit conversion before the macro step.
- Add a
DecimalFloat
(or similarly named type) to the standard library. This contents of this type are represented using an unsigned integer. - Implement all the common operations
add
,sub
,mul
,div
, etc. onDecimalFloat
by calling out to thempdecimal
library through the FFI. Passerine is a zero-dependency language, so this would have to be provided via the runtime (in, e.g.aspen
).
- Find
mpdecimal
bindings for Rust, preferably statically linked.
- Make any binary floating-point numbers raise a
FloatLossOfPrecision
effect at compile time, which is ignored by the compiler by default, but will be displayed as a warning when verbose mode is on.
These are just my own two cents. This is a fairly simple PR and would be a great first contribution! I can mentor you if you want to implement it - not all the requisite infrastructure is in place, but it'll be possible to get a working prototype up and running pretty quickly.
Thanks for bringing this idea to my attention @dumblob, and for your valuable input on effects @ShawSumma.
Thanks for the comprehensive write up @ShawSumma and @slightknack! Sorry for reacting so late (tomorrow I'm leaving for another country with a very dense schedule until at least the end of August so don't wait for me for weeks and do what you think is right).
I mostly agree with you but I'm more afraid of binary floats than you. Did you read the post I wrote some time ago? If you did and you're still not afraid, then any discussion doesn't make much sense ๐. Some more information about the dangers in vlang/v#9915 .
To put it bluntly except for being human-friendly due to base-10 I'm strongly convinced that decimal floats have easier to understand standard (especially precision distribution in the universe of representable numbers, and rounding modes) with better defaults and better enforcement among implementers than binary floats.
I'll just reiterate that I strongly believe it should be at least slightly easier to use decimal floats than binary floats in Passerine (this is the primary motivation why I opened this issue). And there are truly many ways with many subtle differences and edge cases and other dark corners how to achieve that.
- One of them is making decimal floats the default. But it's not necessarily the only approach.
- We can make the number notation (incl. decimal notation) untyped and enforce the user to always explicitly choose (either by casting or other mechanism as there are plenty) while making automatic number promotions safe (i.e. guaranteed no loss of precision).
- Or we can introduce a "debug" compilation mode (even for an interpreter) and more importantly making it the default compilation mode and in this compilation mode error out if the binary float can't be losslessly represented with an error message "nudging" the user to use decimal floats.
- Or we can complicate the language and special-case comparisons (incl. use as keys in maps etc.) and other operations etc. to make binary floats a bit "safer" and easier to use (see the ideas I sketched in vlang/v#5180 (comment) ) by making common issues much more explicit & visible (sometimes at the expense of small performance slow down, sometimes zero-cost as a compile-time measure).
- Arbitrary combination of the above.
- ...
I myself like making decimal floats the default the most but maybe your vision of Passerine's future is different than mine (unsurprisingly ๐) and this won't fit. In that case I'd prefer either (2) (which is future-proof if we wanted to change our decision later) or (3) (not future proof).
Regarding your proposal I don't think it matches the motivation I described - i.e. doesn't make it easier nor on par with binary floats (leaving binary floats as "the only true default", requiring decimal floats to type a long type name everywhere all the time etc.).
As an example the solution (2) from above (i.e. require to always specify with only safe promotions - i.e. those guaranteeing no loss of precision) could leverage the fact mpdecimal
has arbitrary precision. Then a d
suffix for any number literal would make it a decimal float and any f16
f32
f64
f128
suffix would make it a binary float. This nicely conforms to the motivation - it makes it slightly easier to use decimal floats because you don't have to think about the size and at the same time you always type less characters.
Any thoughts and ideas?
Thanks for getting back to me quickly, safe travels!
Yes, I've read the article and am fairly familiar with how numbers are represented under floating point, including the fact that it's a loose standard. I think a lot of issues about warning point can be taken care of at the language level - for example, Rust has Eq
denoting strict equality, and PartialEq
, denoting, well, partial equality. Floats in Rust only implement the second trait, because they don't have a strict notion of equality (as you've pointed out).
With that being said, I think option number 2 (and a bit of 3, and 4) is the best path forwards, with an added caveat. The added caveat that a raw float literal is binary floating point by default (I say this because forcing users to explicitly annotate types is against the spirit of a scripting language) and debug mode is not the default (to prevent the program from producing spurious output when 'just trying to run' something). I
think we should design the language in a way that makes decimal floating point types easy to use, even if they're not the default. Here's what I have in mind:
- I've never been a fan of ending literals (e.g.
5f64
) in general. Instead, we could use the nameDec
to denote decimal types, so making a new decimal is as simple as:x = Dec 27.0
. If this three-letter tag is too verbose (Passerine aims to be concise, after all), we could revisit ending literals later. - In that same vein, If a raw floating point literal is used in a context where it's inferred to be
Dec
, make the literal a decimal. For example, in:x = Dec 0.1; x += 0.2
,0.2
is a decimal. - When converting between types, for example when adding
Real 24.0 + Dec 6.0
, convert to decimal before adding. If using Rust notation, there's a automagically applied implementation ofFrom<Real> for Dec
, but not the other way around (the conversion still exists, it's just explicit). - Functions like
sum = a b c -> a + b + c
operate onDec
no problem. If you write something likeresult: Dec = sum 1 2 3
the arguments are inferred to be decimal.* - Like Rust, do something similar so that
Eq
andOrd
havePartial
variants. - etc.
This was a quick response, so I apologize if I missed anything. The spirit of my pov is as follows: Although binary floating point should be the default (for speed and consistency), decimal fp should also have first-class support. This means it should be just about as easy construct, and take precedence over binary fp whenever possible.
Hope this helps clear things up! As you can see, I don't really want to commit to anything this early on, as we're talking about something that hasn't been implemented - as mentioned earlier, the requisite infrastructure is not really in place for this to be implemented without having to do a major refactor later on. The above route should allow us to play w/ having decimal fp in the language, while still keeping it open enough to avoid premature commitment. Thanks!
* How? in the early stages of the compiler we represent a raw literal as a RawNumber
or something. When we do HM type inference, stricter bounds on the RawNumber
could be inferred. If those bounds are strict, i.e. Dec
or binary Real
, we use that type for the literal - but if the bounds remain loose, we select Real
as our concrete type for that literal.
I think that sounds good to me for now ๐. Just one nit pick:
The added caveat that a raw float literal is binary floating point by default (I say this because forcing users to explicitly annotate types is against the spirit of a scripting language) and debug mode is not the default (to prevent the program from producing spurious output when 'just trying to run' something).
I think there is a difference between "scripting" and "interactive scripting". For scripting I definitely want all the debug output and full safety the language can offer (I understand scripting as general programming with the only difference of having the compilation time close to 0). But for interactive scripting I want exactly the opposite - no warnings, no stuff which would impact (even if slightly) my interactive exploration.
So maybe Passerine wants to distinguish between these two "modes" and act accordingly (i.e. the defaults for scripting would be different than for interactive exploration). I don't know.
I think the difference between "scripting" and "interactive scripting" could be a useful distinction. I guess we could map this to two categories:
- "Interactive scripting" is largely while creating/refactoring a script.
- "Scripting" is just the act of building and running a finished script.
The easiest way to map this to the way aspen is currently set up would be to make the repl be for interactive scripting (perhaps something along the lines of the unison codebase manager, i.e. progressive typechecking and error reporting), and have run
be for scripting scripting.
I'm not sure if this is the perfect line to draw in the sand, but regardless of the above I think Passerine should have a really helpful "Let me help you get your project building again mode" where it tries to produce really friendly and helpful messages while refactoring and such, but once things start to built it'll just be quiet and run things (unless they error, of course). I guess the goal is to catch as many things during compile time as possible, while still spending as little time compiling as possible.
Is the Real->Float commit eb0a823 some kind of a final decision? If we wanted to have "true" real number literals (which we easily can using arbitrary precision arithmetic e.g. in mpdecimal as discussed in #51 ), wouldn't this cause backwards-incompatible changes?
Of course, my assumption is that Float is a strict subset of Real semantics-wise ๐.
To clarify, what I'm currently thinking is that, during lexing, number literal tokens will be represented using a string / some other format that is lossless irrespective of base. When macros / the compiler read over the lex tree, they'll have the option to extract the raw base / digit information to create decimal / arbitrary precision integer / etc. representations. This should pave the way for decimal floating point, even if it is an external library at first.
To clarify, what I'm currently thinking is that, during lexing, number literal tokens will be represented using a string / some other format that is lossless irrespective of base. When macros / the compiler read over the lex tree, they'll have the option to extract the raw base / digit information to create decimal / arbitrary precision integer / etc. representations. This should pave the way for decimal floating point, even if it is an external library at first.
Yes, this would be IMHO ideal. It's basically the same as e.g. Go does - its numbers are "untyped literals" until as long as possible (typically until some inference rules, not necessary type inference though, tries to cast/coerce them to the context they're in).