purescript-contrib/purescript-js-bigints

Something is wrong with fromStringAs method

martyall opened this issue · 2 comments

Something is wrong with fromStringAs vs fromString. fromStringAs seems to be unable to parse anything over the 32bit int limit

> import JS.BigInt as BI                                       
> BI.fromString  "0x7FFFFFFF"
(Just 2147483647)

> BI.fromString  "0x8FFFFFFF"
(Just 2415919103)

> BI.fromStringAs  hexadecimal "7FFFFFFF"
(Just 2147483647)

> BI.fromStringAs  hexadecimal "8FFFFFFF"
Nothing

Here is the relevant section of JS source:

        if (pattern.test(s)) {
          var size = 10;
          var factor = BigInt(radix ** size);
          var r = 0n

          for (var i = 0; i < s.length; i += size) {
            var n = parseInt(s.slice(i, i + size), radix);

            // check for NaN
            if ((n | 0) !== n) {
              return nothing;
            }

            r = r * factor + BigInt(n);
          }

          return just(r);
        } else {
          return nothing;
        }
      };
    };

It seems like if you hard code the slice window size as 10, in this case you are potentially trying to convert
8FFFFFFF into a JS int before arithmetically folding it in to the accumulator, which is why it's failing.

At the very least we should be able to accommodate all of the radixes in Data.Int that have names, and in the current representation I believe hexadecimal and base36 are broken. I feel like the move here is to set the size = 6 in the above code, sacrificing speed in some cases for correct behavior in all. The reason for 6 is that

2147483647 (base 10) == ZIK0ZJ (base 36)

So it was weird to me that the cap seems to be around 32-bit MAX_VALUE, when Number.MAX_SAFE_INTEGER is 2^53-1. 8FFFFFFF fits perfectly fine into a regular Number:

image

I think the issue is in the NaN check:

// check for NaN
if ((n | 0) !== n) {
  return nothing;
}

Per MDN for bitwise OR, if you're doing bitwise OR and one or both of your operands aren't BigInts, they'll get coerced down to 32-bit values. This is pretty easy to demonstrate:
image

So dropping the window down to 6 would work around the issue, since you'll never exceed 32BIT_MAX, at the cost of speed as @martyall notes. Alternatively, reworking the NaN check a little bit, lets you keep the 10-char window.
Per MDN for parseInt, you shouldn't ever get anything besides a (integer) Number or NaN, so the check could simply be.

// check for NaN
if (n === NaN) {
  return nothing;
}

I have actually tried both of the solutions proposed on this thread and neither was able to pass this test

https://github.com/martyall/purescript-js-bigints/pull/1/files#diff-71732b478b4808898d86c8591ad7ab46d8122c1e4facec4a9151ac49efba905dR83-R86

Here is evidence of your master branch failing: https://github.com/martyall/purescript-js-bigints/actions/runs/6154957373/job/16701107739