ewasm/design

System/standard library for bignumbers

axic opened this issue · 19 comments

axic commented

EVM supports 256-bit operations "natively". While that precision is not needed in a lot of cases it does come as very useful for certain operations, such as handling token balances, in Ethereum.

With wasm one would need to include a bignumber library to do that, potentially in a lot of contracts. This suggests a good opportunity to define a standard bignumber library, which can be used by contracts, but it is not mandatory to be used.

Due to the design of "host" functions in wasm (having a separation of namespaces) it is possible to design this in a forward compatible manner, but implement it right now in the clients, such as Hera.

Lets define a new namespace stdbn (jk, bignum or stdbignum) with the following

  • mul256(a: i32ptr, b: i32ptr, out: i32ptr)
  • mulmod256(a: i32ptr, b: i32ptr, mod: i32ptr, out: i32ptr)
  • others TBD

All of the arguments are 32-bit pointers to a memory location containing a 256-bit little endian number. The pointers can overlap or be the same.

In the future this could also be implemented as a (system) library: #17.

The main benefit of a "system library" is the predefined address it lives at and the client's ability to make a decision whether it uses a wasm blob or implements it natively.

I'm not sure the std prefix makes any difference here.

The name mul256 is fine because it's the same for signed and unsigned number. But the second one should be rather umulmod256.

The naming conventions are not very consistent everywhere, but I believe there are 3 options:

  • assembly like: no prefix - signed or sign neutral, u prefix - unsigned.
  • EVM like: no prefix - unsigned or sign neutral, s prefix - signed.
  • pedantic like: no prefix - sign neutral, u prefix - unsigned, s prefix - signed.
axic commented

I like the pedantic option.

axic commented

I think we should add all 256-bit methods to the host interface: add, sub, mul, div, mulmod, cmp. Also need to consider support for signed numbers.

We could argue that some of them are fast enough if implemented in Wasm, but the secondary goal here is to reduce code duplication and it helps in that.

Any comments?

Personally I feel like it would complicate the design for the bignum library to only contain the arithmetic methods that are not fast enough in wasm.

Reducing code duplication is definitely another big benefit of having a "complete" bignum lib. As long as we can minimize host function overhead, this is great for reducing contract bloat.

axic commented

Potentially we could make use of the references proposal for Wasm. Let assume the following example is for fixed 256-bit long bignums:

bignum.load(memOffset: i32) -> anyref
bignum.store(memOffset: i32, v: anyref)
bignum.mul(a: anyref, b: anyref)
bignum.add(a: anyref, b: anyref)
...

This assumes that opaque references (what anyref is) would be efficiently passed around in interpreters.

axic commented

Let's agree on this first basic API, which isn't using a bignum stack:

  • addu256(a: i32ptr, b: i32ptr, c: i32ptr) where c = a + b
  • subu256(a, b, c) where c = a - b
  • mulu256(a, b, c) where c = a * b
  • divu256(a, b, c) where c = a / b
  • mulmodu256(a, b, c, d) where d = (a * b) % c
  • cmpu256(a: i32ptr, b: i32ptr) -> i32 where the returned value is less than zero if a < b, zero if a == b or larger than zero if a > b

@chfast @s1na what do you think?

poemm commented

How about the first argument is the output. so output position is consistent for different numbers of arguments, for example binary_op(output,input1,input2) and unary_op(output, input)?

axic commented

I thought about having output as the first operand, but wasn't sure of the merit. Some POSIX functions do that, but find some of them confusing.

What would be the benefit we gain? Note, I expect this to change during the upcoming months. We also need to make a decision whether to replace (or add) a stack based version.

poemm commented

What would be the benefit we gain?

Convention. Gnu Multiple Precision and lots of crypto pass output first. But it doesn't matter.

We also need to make a decision whether to replace (or add) a stack based version.

Not sure which algorithms benefit from the stack based version, other than pathological benchmarks. But it may be simple enough to implement -- bin_op256_stack() can just call bin_op256(stack_ptr-32, stack_ptr-32, stack_ptr) and update the stack pointer.

I agree with @axic here.
Although it's somewhat common in POSIX and other legacy conventions to use op(dst, src),
It is far more linguistically intuitive writing op(src, dst) when lacking the muscle memory for the former.

I agree with @axic here.
Although it's somewhat common in POSIX and other legacy conventions to use op(dst, src),
It is far more linguistically intuitive writing op(src, dst) when lacking the muscle memory for the former.

This actually go against many other conventions, e.g. Go and other OOP designs (to some distinct): https://golang.org/pkg/math/big/, RISC assembly, C calling convention.

Also a += 1 becomes add(a, 1, a) instead of add(a, a, 1).

axic commented

@chfast can you clarify which convention you referred to or which one you prefer?

I prefer the outputs to be the first arguments.

s1na commented

A couple more discussion points:

poemm commented

What about returning a carry where necessary, e.g. add256?

Makes sense to return a carry. It may also make sense to have an extra argument for carry.

Whether montgomery multiplication should be part of the api.

It should. Modular multiplication is a bottleneck for elliptic curve crypto. MontgomeryMultiplication256 is very popular for this. We may also want MontgomeryReduction256 to transform to/from Montgomery form, and to speed up squaring. We may eventually consider BarrettMultiplication256 which is popular too. A problem is metering - some of these are faster with non-constant runtime, so we may have to meter conservatively, or break the algorithms into constant-runtime chunks, and meter each chunk individually.

How to handle arbitrarily large numbers

@chfast wisely said that anything bigger than mul256 will have register pressure and a slowdown. And that many operations can be built by composition of smaller algorithms. For example, we can build mul768 from mul256x256->512s. For this reason, we may eventually want mul64x64->128 and mul128x128->256 as building blocks, but this is less important for now.

axic commented

What about returning a carry where necessary, e.g. add256?

We kind of agreed to have another set of methods with carry.

s1na commented

Proposing this slightly modified API:

  • add(width: i32, a: i32ptr, b: i32ptr, c: i32ptr): bool where c = a + b, width is for both a and b, and return value indicates carry
  • sub(width: i32, a, b, c): bool where c = a - b, and return value indicates carry (c < 0)
  • mul(width: i32, a, b, c): void where c = a * b
  • div(width: i32, a, b, q, r): void where q = a / b and r = a % b
  • mulmod(a, b, c, d): void where d = (a * b) % c (not sure if width is needed here?)
  • cmp(width: i32, a: i32ptr, b: i32ptr): i32 where the returned value is less than zero if a < b, zero if a == b or larger than zero if a > b

Note: these changes only make sense if the overhead of additional parameters and return values is negligible.

Carry for mul is not practical. It's a bit of trouble to get it, but is not useful in higher precision multiplication.

s1na commented

@chfast Thanks for the input. Removed carry for mul and mulmod.