Operations in Fortran which are not bit-reproducible

Question

Operations in Fortran which are not bit-reproducible

marshallward opened this issue a year ago · 1 comments

Numerical computation in Fortran uses some form of floating point arithmetic. As a sparse subset of the real numbers, any numerical operation is prone to some kind of estimation. While not explicitly required to use the IEEE-754 model, it is by far the most widely used one.

Although Fortran specifies clear rules for many such operations, there are some areas of ambiguity. This can create challenges for programs which require predictable, bitwise reproducible results.

I have tried to document these areas of ambiguity, along with comments on how to retain some degree of reproducibility.

Order of operations

Expressions containing more than two terms without parentheses can be evaluated in any order.

Example: s = a + b + c has three valid forms of evaluation.
- t = a + b ; s = t + c
- t = b + c ; s = a + t
- t = a + c ; s = t + b
Parentheses can be used to remove this ambiguity by reducing expressions to nested binary operations. The following are unambiguous:
- s = (a + b) + c
- s = a + (b + c)
Note that there may be physical reasons for retaining a particular order of operations, particularly one which preserves a residual value that might otherwise be lost.
Fused multiply-add (FMA)

Expressions of the form a*b + c may be evaluated in either two steps mul(add(a,b),c) or a single step fma(a,b,c). The latter will tend to produce a result of higher accuracy.

If the hardware supports FMA arithmetic, then it is likely that a processor will maximize the number of FMAs in an expression. But there is no obligation to do so, which is another potential source of ambiguity.

Parentheses offer a means to exclude FMA operations, so that the expression becomes unambiguous. For example, (a*b) + c must compute a*b before adding c. However, AFAIK there is no unambiguous way to signal the processor to use an FMA with arithmetic operators.

The ieee_arithmetic module provides the ieee_fma() function, which should be unambiguous but also requires one to abandon mathematical syntax.

See #310 for more discussion.
Intrinsic transcendentals

Numerical functions such as exp(), cos(), bessel_j0(), &c. are implemented by the processor. The methods used to compute these functions will be ambiguous, and thus the values will also be ambiguous.

Some compilers will use implementations of these functions in a known library such as libm, but there is no obligation to do so.

To the best of my knowledge, there is no way to produce bit-reproducible results from these functions. In certain cases, it may be possible to approximate the functions.

(Note: One exception is sqrt(), which AFAIK is required to be exact and correctly rounded. ieee_arithmetic's ieee_support_sqrt can verify if this is correct.)
Exponent operator a**b

The most general form exp(b*log(a)) uses transcendentals and is therefore ambiguous.

However, there are cases which can be evaluated without ambiguity.
- If b is an integer of 2 or 3, then result is independent of order. Higher integers are sensitive to the order of operations.
  
  For higher powers, it may be possible to construct it from unambiguous operations, e.g. a**6 as (a**2)**3
- If a is equal to 2 (or more formally the base of the floating point model) and b is an integer, then exponentiation is reduced to integer addition of the exponent, which is unambiguous.
- If b equals 0.5 (under base 2) then the expression can be reduced to sqrt(a) and the result ought to be exact (under IEEE-754 rules).
Intrinsic operations

Functions with an ambiguous order of operation will produce ambiguous results. For example:
- sum(a)
- prod(a)
- matmul(a,b)
To avoid ambiguity, these must be implemented explicitly. Any potential optimizations in these functions may be lost.
Floating point rounding

Operations which produce results outside of the floating point set will require some kind of rounding operation. IEEE-754 provides several methods. If unspecified or not checked, then the results could change.

The ieee_arithmetic module provides methods to control this, although I personally have no experience with them.

This was submitted in response to the discussion in #310.

Note that this is not necessarily an "issue"; this should be taken as a reference of possible bit reproducibility issues which one may be encounter. An ambiguous expression provides an opportunity for a compiler to optimize the result, such as for a particular platform.

I can edit this top-level comment to include any other examples of interest.

Answer 1 · 2023-09-23T04:01:38.000Z

@marshallward awesome, thanks for writing this up!