ksh93/ksh

Division by negative numbers returns "division by zero"

leo-ard opened this issue ยท 10 comments

Doing an arithmetic expansion with a division where the dividend is a negative number returns a "division by zero error" :

> ksh
$ echo $(( 10 / -10 ))
ksh:  10 / -10 : divide by zero
$ echo $(( 10 / -1 ))
ksh:  10 / -1 : divide by zero
$ echo $(( 10 / -2 ))
ksh:  10 / -2 : divide by zero

Tested with version sh (AT&T Research) 93u+ 2012-08-01 of ksh. Running on Macos M2 14.1.1 (23B81)

Still reproducible in 93u+m, unfortunately. Thanks for the report.

Kind of amazing it took this long for someone to notice!

So, we forked from the last "stable" (ha!) AT&T release, 93u+ 2012-08-01, inheriting this unfortunate bug.

Testing my stash of compiled historic AT&T ksh versions shows that the bug was introduced in 93q 2005-01-31 and fixed in 93v- 2013-06-28.

So, with the help of the ast-open-archive repo, we should be able to isolate and backport the fix from that beta version.

This is likely the relevant RELEASE file entry from the 2013-06-28 commit:

13-06-04  A bug with the arithmetic expression $((2**63 / -1)) on some
         systems has been fixed.

And here is the 93v- 2013-06-28 change that fixes the bug, as a patch to 93u+m:

diff --git a/src/cmd/ksh93/sh/streval.c b/src/cmd/ksh93/sh/streval.c
index 79f75ee14..ac09e3d01 100644
--- a/src/cmd/ksh93/sh/streval.c
+++ b/src/cmd/ksh93/sh/streval.c
@@ -351,7 +351,7 @@ Sfdouble_t	arith_exec(Arith_t *ep)
 				num = sp[-1]/num;
 				type = 1;
 			}
-			else if((Sfulong_t)(num)==0)
+			else if((Sfulong_t)(num < 0 ? -num : num)==0)
 				arith_error(e_divzero,ep->expr,ep->emode);
 			else if(type==2 || tp[-1]==2)
 				num = U2F((Sfulong_t)(sp[-1]) / (Sfulong_t)(num));

So, the bug was caused by the fact that a floating-point variable with a negative value (num) was typecast to an unsigned integer type (Sfulong_t which is unsigned long long on most systems).

Converting a negative integer to an unsigned integer has implementation-defined results in C. However, as I am learning right now, casting a negative float to an unsigned integer is undefined behaviour according to the C standard. In practice, it behaves differently on Intel systems than it does on ARM systems (including Apple's M1/M2 processors). On ARM systems, the result of the undefined typecast is simply always 0, resulting in this bug.

The patch fixes it by making sure that the number is not negative before typecasting it.

Relevant reading (this article is about C++ but the undefined behaviour is the same in C):
https://embeddeduse.com/2013/08/25/casting-a-negative-float-to-an-unsigned-int/

Wow, what a sneaky bug. Thanks for sharing the debugging process ! It's crazy how little details in the C specification can cause bugs years later.

If you are interested, I found this bug working on pnut, a C to Shell compiler (yes you read this right). We had some division tests that were failing, flagging this bug in the underlying shell implementation. Among the shells we run on, Ksh is often the fastest one (and sometimes by far)

Workaround: make one of the numbers a floating point type by adding a .0 or ,0 (depending on the locale). Assuming LC_NUMERIC=C (or another locale that uses .), the following works without a patch:

$ echo $(( 10.0 / -10 ))                                                                                                           
-1
$ echo $(( 10 / -1.0 ))
-10
$ echo $(( 10 / -2.0 ))
-5

Oh nice trick. My workaround was to negate the divisor and the result when the divisor was negative :

#Lets say we want to divide x by y

if [ $y -lt 0 ] ; then 
    result=$((-(x / -y)))
else 
    result=$((x / y)) 
fi

I introduced a similar bug with the same undefined behaviour myself in 4592859 (see there for explanation). On ARM systems:

$ typeset -ui i
$ echo $((i = -5))
0
$ echo $i
4294967291

The assignment worked fine (wrapped around as expected for an unsigned integer), but the value of the assignment was 0 and should have been the same as the variable's new value.

So lines 272, 274 and 276 have undefined behaviour for a negative n:

else if((attr & NV_UINT64)==NV_UINT64) /* long unsigned integer */
r = (uintmax_t)n;
else if((attr & NV_UINT16)==NV_UINT16) /* short unsigned integer */
r = (uint16_t)n;
else if((attr & NV_UINT32)==NV_UINT32) /* normal unsigned integer */
r = (uint32_t)n;

Patch to test for that other problem: to avoid undefined behaviour, invert the sign of a negative number before the typecast, then invert it again after. Maybe this is naive.

diff --git a/src/cmd/ksh93/sh/arith.c b/src/cmd/ksh93/sh/arith.c
index 01703e45f..6327b6865 100644
--- a/src/cmd/ksh93/sh/arith.c
+++ b/src/cmd/ksh93/sh/arith.c
@@ -269,11 +269,11 @@ static Sfdouble_t arith(const char **ptr, struct lval *lvalue, int type, Sfdoubl
 		else if((attr & NV_DOUBLE)==NV_DOUBLE)		/* normal float */
 			r = (double)n;
 		else if((attr & NV_UINT64)==NV_UINT64)		/* long unsigned integer */
-			r = (uintmax_t)n;
+			r = n < 0 ? -((uintmax_t)-n) : (uintmax_t)n;
 		else if((attr & NV_UINT16)==NV_UINT16)		/* short unsigned integer */
-			r = (uint16_t)n;
+			r = n < 0 ? -((uint16_t)-n) : (uint16_t)n;
 		else if((attr & NV_UINT32)==NV_UINT32)		/* normal unsigned integer */
-			r = (uint32_t)n;
+			r = n < 0 ? -((uint32_t)-n) : (uint32_t)n;
 		else if((attr & NV_INT64)==NV_INT64)		/* long signed integer */
 			r = (intmax_t)n;
 		else if((attr & NV_INT16)==NV_INT16)		/* short signed integer */

Thinking through that second patch. I think it's actually fine, if not obvious. If n is negative, we invert the sign using the unary minus, do the typecast, and then do another unary minus operation -- at which point we're performing a unary minus on a positive unsigned integer value, which is well-defined behaviour in C and yields a positive wrapped-around unsigned integer value, which is fine to implicitly typecast back to r's float type.

There are more terrible arithmetic problems that are a lot harder to solve, but that's for a separate issue, see #771.