System/standard library for bignumbers
axic opened this issue · 19 comments
EVM supports 256-bit operations "natively". While that precision is not needed in a lot of cases it does come as very useful for certain operations, such as handling token balances, in Ethereum.
With wasm one would need to include a bignumber library to do that, potentially in a lot of contracts. This suggests a good opportunity to define a standard bignumber library, which can be used by contracts, but it is not mandatory to be used.
Due to the design of "host" functions in wasm (having a separation of namespaces) it is possible to design this in a forward compatible manner, but implement it right now in the clients, such as Hera.
Lets define a new namespace stdbn
(jk, bignum
or stdbignum
) with the following
mul256(a: i32ptr, b: i32ptr, out: i32ptr)
mulmod256(a: i32ptr, b: i32ptr, mod: i32ptr, out: i32ptr)
- others TBD
All of the arguments are 32-bit pointers to a memory location containing a 256-bit little endian number. The pointers can overlap or be the same.
In the future this could also be implemented as a (system) library: #17.
The main benefit of a "system library" is the predefined address it lives at and the client's ability to make a decision whether it uses a wasm blob or implements it natively.
I'm not sure the std
prefix makes any difference here.
The name mul256
is fine because it's the same for signed and unsigned number. But the second one should be rather umulmod256
.
The naming conventions are not very consistent everywhere, but I believe there are 3 options:
- assembly like: no prefix - signed or sign neutral,
u
prefix - unsigned. - EVM like: no prefix - unsigned or sign neutral,
s
prefix - signed. - pedantic like: no prefix - sign neutral,
u
prefix - unsigned,s
prefix - signed.
I like the pedantic option.
I think we should add all 256-bit methods to the host interface: add
, sub
, mul
, div
, mulmod
, cmp
. Also need to consider support for signed numbers.
We could argue that some of them are fast enough if implemented in Wasm, but the secondary goal here is to reduce code duplication and it helps in that.
Any comments?
Personally I feel like it would complicate the design for the bignum library to only contain the arithmetic methods that are not fast enough in wasm.
Reducing code duplication is definitely another big benefit of having a "complete" bignum lib. As long as we can minimize host function overhead, this is great for reducing contract bloat.
Potentially we could make use of the references proposal for Wasm. Let assume the following example is for fixed 256-bit long bignums:
bignum.load(memOffset: i32) -> anyref
bignum.store(memOffset: i32, v: anyref)
bignum.mul(a: anyref, b: anyref)
bignum.add(a: anyref, b: anyref)
...
This assumes that opaque references (what anyref
is) would be efficiently passed around in interpreters.
Let's agree on this first basic API, which isn't using a bignum stack:
addu256(a: i32ptr, b: i32ptr, c: i32ptr)
wherec = a + b
subu256(a, b, c)
wherec = a - b
mulu256(a, b, c)
wherec = a * b
divu256(a, b, c)
wherec = a / b
mulmodu256(a, b, c, d)
whered = (a * b) % c
cmpu256(a: i32ptr, b: i32ptr) -> i32
where the returned value is less than zero ifa < b
, zero ifa == b
or larger than zero ifa > b
How about the first argument is the output. so output position is consistent for different numbers of arguments, for example binary_op(output,input1,input2)
and unary_op(output, input)
?
I thought about having output as the first operand, but wasn't sure of the merit. Some POSIX functions do that, but find some of them confusing.
What would be the benefit we gain? Note, I expect this to change during the upcoming months. We also need to make a decision whether to replace (or add) a stack based version.
What would be the benefit we gain?
Convention. Gnu Multiple Precision and lots of crypto pass output first. But it doesn't matter.
We also need to make a decision whether to replace (or add) a stack based version.
Not sure which algorithms benefit from the stack based version, other than pathological benchmarks. But it may be simple enough to implement -- bin_op256_stack()
can just call bin_op256(stack_ptr-32, stack_ptr-32, stack_ptr)
and update the stack pointer.
I agree with @axic here.
Although it's somewhat common in POSIX and other legacy conventions to use op(dst, src)
,
It is far more linguistically intuitive writing op(src, dst)
when lacking the muscle memory for the former.
I agree with @axic here.
Although it's somewhat common in POSIX and other legacy conventions to useop(dst, src)
,
It is far more linguistically intuitive writingop(src, dst)
when lacking the muscle memory for the former.
This actually go against many other conventions, e.g. Go and other OOP designs (to some distinct): https://golang.org/pkg/math/big/, RISC assembly, C calling convention.
Also a += 1
becomes add(a, 1, a)
instead of add(a, a, 1)
.
I prefer the outputs to be the first arguments.
A couple more discussion points:
- What about returning a carry where necessary, e.g.
add256
? @cdetrio is doing it here to maintain compatibility with websnark (the result is also in LE). - Whether montgomery multiplication (https://github.com/cdetrio/scout.ts/blob/358b99ea1b35edfde885a82ee1afd84e6e04b840/src/lib.ts#L133) should be part of the api. Probably needs benchmarks.
- How to handle arbitrarily large numbers
What about returning a carry where necessary, e.g. add256?
Makes sense to return a carry. It may also make sense to have an extra argument for carry.
Whether montgomery multiplication should be part of the api.
It should. Modular multiplication is a bottleneck for elliptic curve crypto. MontgomeryMultiplication256
is very popular for this. We may also want MontgomeryReduction256
to transform to/from Montgomery form, and to speed up squaring. We may eventually consider BarrettMultiplication256
which is popular too. A problem is metering - some of these are faster with non-constant runtime, so we may have to meter conservatively, or break the algorithms into constant-runtime chunks, and meter each chunk individually.
How to handle arbitrarily large numbers
@chfast wisely said that anything bigger than mul256
will have register pressure and a slowdown. And that many operations can be built by composition of smaller algorithms. For example, we can build mul768
from mul256x256->512
s. For this reason, we may eventually want mul64x64->128
and mul128x128->256
as building blocks, but this is less important for now.
What about returning a carry where necessary, e.g. add256?
We kind of agreed to have another set of methods with carry.
Proposing this slightly modified API:
add(width: i32, a: i32ptr, b: i32ptr, c: i32ptr): bool
wherec = a + b
, width is for botha
andb
, and return value indicates carrysub(width: i32, a, b, c): bool
wherec = a - b
, and return value indicates carry (c < 0
)mul(width: i32, a, b, c): void
wherec = a * b
div(width: i32, a, b, q, r): void
whereq = a / b
andr = a % b
mulmod(a, b, c, d): void
whered = (a * b) % c
(not sure if width is needed here?)cmp(width: i32, a: i32ptr, b: i32ptr): i32
where the returned value is less than zero if a < b, zero if a == b or larger than zero if a > b
Note: these changes only make sense if the overhead of additional parameters and return values is negligible.
Carry for mul
is not practical. It's a bit of trouble to get it, but is not useful in higher precision multiplication.