User-defined value types

Question

User-defined value types

chriseth opened this issue 4 years ago · 33 comments

Originally proposed by @maurelian

It should be possible to define a user-defined type that is identical to a provided value type (including operators, members, etc.), but cannot be implicitly converted to any other type, and can only be explicitly converted to and from the underlying value type.

This has the effect that arithmetic operations are impossible to perform across these types.

Example (syntax under discussion):

typedef DistanceInMeters uint;
typedef DistanceInInch uint;
typedef Price fixed128x10;

DistanceInMeters distanceToDestination;
Price buyerPrice;
Price sellerPrice;

In the above, you can do buyerPrice - sellerPrice, but you cannot do buyerPrice * distanceToDestination.

Answer 1 · 2021-06-16T10:28:43.000Z

Currently, there is a hack to define user defined address types using empty interfaces. (Saw this in a discussion from eth security.)

interface MyAddress {}

function f(address a) returns (address) {
	MyAddress myAddress = MyAddress(a);
	return address(myAddress);
}

I think user defined types is a good idea. But I think the rules "cannot be implicitly converted" and "automatically derives arithmetic operations" are not consistent.

To take an example from Haskell, type is an alias and can be interchangeably used with the original type. In particular, implicit conversions and operators just work. Whereas newtype creates a new type that has stricter type requirements and can't just be converted to the underlying type. To define operations on it, one has to explicitly declare it.

Example code from Haskell's user defined types

-- Can be interchangeably used with Int
type MyInt = Int

f :: Int -> Int
f x = x + 1

-- Implicit converstion: type checks
a :: MyInt
a = 5

-- Addition and implicit conversion: type checks
b :: MyInt
b = f a

-- Creates a new type, but implicit conversions are not possible
newtype MyNewInt = MyNewInt Int

-- Implicit conversion doesn't type check
-- c :: MyNewInt
-- c = 5

c :: MyNewInt
c = MyNewInt 5

-- Doesn't type check
-- d :: Int
-- d = f c

-- Doesn't type check, since + is not defined
-- d :: MyNewInt
-- d = c + 5

newtype MyNewIntWithPlus = MyNewIntWithPlus Int

instance Num MyNewIntWithPlus where
  MyNewIntWithPlus a + MyNewIntWithPlus b = MyNewIntWithPlus (a + b)
  MyNewIntWithPlus _ * MyNewIntWithPlus _ = undefined
  signum = undefined
  abs = undefined
  fromInteger = undefined
  negate = undefined

d :: MyNewIntWithPlus
d = MyNewIntWithPlus 10

-- Now addition type checks.
e :: MyNewIntWithPlus
e = d + d

main = undefined

I think we should have an explicit syntax that shows that the operators are derived, and by default, none of the operators should be derived.

The syntax could look like

newtype MyInt = uint256 deriving (+, -, *, ==);
using MyInt = uint256 inheriting (+, ==);

As a side comment, I think the operator < for MyAddress in the first example code should be disallowed.

Answer 2 · 2021-06-16T16:11:33.000Z

Thanks for opening this for me @chriseth, I swear it was still on my todo list. :)

Currently, there is a hack to define user defined address types using empty interfaces. (Saw this in a discussion from eth security.)

I think I might have started that conversation. The original context at Optimism is that we would like to differentiate between addresses on L1 and L2, and prevent them from being used interchangeably.

typedef ethAddress address;
typedef ovmAddress address;

function(ovmAddress oAddr, bytes data) {
     oAddr.call(data);   // typeEror
}

I also generally like the sound of what @hrkrshnn proposes for valid operators.

Answer 3 · 2021-06-21T09:35:19.000Z

On second thought, I'm not sure if we should allow arithmetic operators for user defined types.

The main problem is whether inbuilt safemath should apply for such types, and if yes, should unchecked blocks do wrapping arithmetic?

Perhaps the only operator that a user defined type should have is == and !=. (Even better would be making this explicit)
I would be okay with just disallowing == as well. We currently don't have == for tuples either (see #8919). Perhaps, we can think about defining operators on all such types, tuples and structs as a separate issue afterwards.

Example on how equality is handled in Haskell's newtype

newtype MyInt = MyInt Int

a :: MyInt
a = MyInt 10

a' :: MyInt
a' = MyInt 11

-- Doesn't type check, since == is not defined
-- No instance for (Eq MyInt) arising from a use of ‘==’
-- b :: Bool
-- b = a == a'

newtype MyIntWithEq = MyIntWithEq Int
                    deriving(Eq)

c :: MyIntWithEq
c = MyIntWithEq 10

c' :: MyIntWithEq
c' = MyIntWithEq 20

b :: Bool
b = c == c'

b' :: Bool
b' = c /= c'

main = undefined

A user can define functions to do arithmetic operations on it anyway, where the overflow behaviour is explicit.

newtype MyInt = uint256;

function add(MyInt a, MyInt b) returns (Myint c) {
   c = MyInt(uint256(a) + uint256(b));
}

function unchecked_add(MyInt a, Myint b) returns (Myint c) {
  unchecked {
    c = MyInt(uint256(a) + uint256(b));
  }
}

Answer 4 · 2021-06-23T12:54:03.000Z

If we want to explicitly list operators, I see this feature very much in line with the "using A for x" statement:

type NewType = uint using (+, g, f, L);

which means that for

NewType a;
NewType b;

you can do

a + b;
a.g();
a.f();
a.t(); // with t being a function of L

Alternatives for the syntax:

type NewType = uint: +, -, f;
type NewType = uint with +, -, f;
type NewType: uint using (+, -, f);
type NewType: uint (+, -, f);
type NewType(uint) using +, -, f;
type NewType is uint using +, -, f;

Answer 5 · 2021-06-23T12:55:14.000Z

@leonardoalt mentioned that he would like to see something like Rust's trait system in Solidity. I think we can extend the current notation to traits in the future.

Answer 6 · 2021-06-23T13:08:06.000Z

Alternatives for the syntax:

How about reusing the syntax for libraries?

type NewType = uint;
using {+, g, f, L} for NewType;

It would also allow multiple libs to define stuff for the same type.

Answer 7 · 2021-06-24T00:25:02.000Z

Separating the using and type declarations as in the previous comment sounds like it would allow users of a type to define more arithmetic operations for it, breaking the encapsulation benefits. I may want to define a type and disallow addition by not deriving the + operator, and I wouldn't want users to be able to add it themselves. Though I guess they'll always be able to cast back to the original type... so it's not full encapsulation, but at least the casting makes it explicit that you're breaking out of the type rules.

Answer 8 · 2021-06-24T12:59:29.000Z

About the using syntax:

type NewType = uint;
using {+, g, f, L} for NewType;

The problem I have with this syntax is where is + coming from? We haven't defined a + with signature (NewType, NewType) -> NewType. (I'm assuming that in the above example, g and f are functions with signature (NewType, ...) -> ... and not (uint, ...) -> ...; And the binding to the latter signature should be a type error.)

The using syntax looks fine, if we want to bind a function with type (NewType, ...) -> ... to NewType and call it by NewType(value).f(...), but if we want to copy an operator from the uint to NewType, it has to be done in the definition.

The intuition for a syntax like

type NewType = uint deriving (==);

would be that there is a typeclass (as in Haskell) or a trait (as in rust) with the name ==. And the language already implements an instance of == on uint. When a new type is constructed, we derive the implementation of the typeclass or trait from the base type.

It's slightly weird to have typeclass / traits with the name of the operator. In haskell and rust, == would be Eq. For +, it's Additive in haskell (or part of Num, which also includes *, negate, etc.) and Add in rust.

If we want to really go in this direction, we could consider defining proper typeclasses or traits with standard names, such as Eq, Add etc. And the following syntax would actually make perfect sense:

type newType = uint deriving (Eq, Add);

The keyword derive is also present in rust, which does exactly as above. See https://doc.rust-lang.org/rust-by-example/trait/derive.html

Answer 9 · 2021-06-30T09:47:30.000Z

I see your point about + only being defined for uint, but I don't see why introducing a trait / typeclass would fix the issue. You would still need to have the exception that the compiler defines operators for you.

Another point about the syntax: I think we should not use = because it looks too much like a type alias. What about type newType is uint; or type newType: uint;?

Answer 10 · 2021-07-20T15:37:32.000Z

As a reference: this was discussed earlier in #1100 (and #1013, though this one shifted to non-typedef discussions).

I think one benefit now of user defined types could be that they show up as internalType in the ABI, allowing for more precise off-chain decoding/encoding/validation of calldata.

Answer 11 · 2021-07-20T18:17:57.000Z

The easiest solution (i.e. let's implement this now) is to disable operators and disable anything implicit.

If you really need your custom addition function divMod(x, y) to apply to a new integer type (which is exactly just uint256 anyway) then you can rewrite the function.

Requiring explicit casting is a huge safety feature. We can all name languages that use typecasting for safety, mine is Ada.

Proposal:

type L1Address is address;
type L2Address is address;
L1Address base = L1Address(msg.sender);
base = msg.sender; // ERROR
L2Address rollup = L2Address(msg.sender);
base = rollup; // ERROR

type NFTBatchID is uint32;
NFTBatchID batchId = 16; // ERROR
batchId = NFTBatchID(16);
function getBatchFromToken(NFTBatchID batchID); // ABI is getBatchFromToken(uint32)

Answer 12 · 2021-07-26T16:17:29.000Z

Yep, I agree, let's do it without operators for now and check for feedback.

Answer 13 · 2021-07-27T08:31:02.000Z

Yep, I agree, let's do it without operators for now and check for feedback.

To be clear, this would mean no operators, in particular no ==, right?

Answer 14 · 2021-07-27T09:09:13.000Z

Yes, nothing at all.

Answer 15 · 2021-08-04T12:20:56.000Z

Decision: type MyType is uint; is the syntax. No operators, even ==.

Answer 16 · 2021-08-20T11:33:33.000Z

I think it would be better to have type MyType is uint; introduce explicit conversion functions as members of MyType. rather than allowing normal explicit conversions from and to the underlying type.

Answer 17 · 2021-08-23T13:20:03.000Z

I think it would be better to have type MyType is uint; introduce explicit conversion functions as members of MyType. rather than allowing normal explicit conversions from and to the underlying type.

Just to name a few reasons for this:

Those "conversions" are isomorphisms in the type system you will regularly need for composing generics, so it is important that these functions can be named, if we don't want to have boilerplate of functions calling a single explicit conversion everywhere.
Conversion is a special syntactic concept that has semantic meaning that can easily differ from plainly accessing a representation. Converting to uint for fixed points or floating points, etc., is not the same as retrieving the representation type. So explicit conversions are much closer to an operator than to a way to access representation types or construct abstract types.
While we can easily extend this system to allow defining conversion operators, there is no way to define other differing conversion functions or disallow conversions, if this is the sole atomic way to convert between these types.
Adding members of the newly defined type is not conflicting or breaking in any way, so there is little headache about how to do this.

Answer 18 · 2021-08-30T12:49:57.000Z

While I agree with most of what you say, I think it is important that code implementing functions / operators for user-defined value types is still concise. Otherwise, we could also tackle this problem together with a generic overhaul of the conversion system. The point about functions being nameable can be fixed by also providing such a conversion function on the type or just change it once we have generics.

Are there any suggestions with regards to the syntax?

MyType.fromUnderlying(x);
MyType.toUnderlying(x);

MyType.construct(x);
MyType.destruct(x);

MyType.create(x);
MyType.access(x);

MyType.wrap(x);
MyType.unwrap(x);

Or maybe this is a situation where it is OK to use abbreviations?

Answer 19 · 2021-08-30T16:54:06.000Z

type BatchIdentifier is uint32;

I don't see why converting a uint32 to/from a BatchIdentifier needs to be more complicated than converting a uint32 to/from a uint256.

Therefore, if uint32 hi = 16; uint256(hi); is acceptable then so should be uint32 batch 15; BatchIdentifier(batch);

Answer 20 · 2021-08-31T07:17:00.000Z

I don't see why converting a uint32 to/from a BatchIdentifier needs to be more complicated than converting a uint32 to/from a uint256.

For most cases, uint256(x) and BatchIdentifier(y) are sufficient, but there are some situations where this is sub-optimal. For example, if you introduce a fixed-point type

type MyFixed is uint;

Then MyFixed x = ...; uint y = uint(x); could create the impression that y is the value of x rounded to an integer, which it is not.

Answer 21 · 2021-08-31T07:41:48.000Z

This

type NotAUint is uint;

creates the impression the NotAUint is not a uint. We cannot stop that.

Did you mean this?

type MyFixed is fixed;

Then in that case you should have:

MyFixed x = ...;
uint y = uint(x); // COMPILER ERROR cannot convert from fixed to uint

Answer 22 · 2021-08-31T07:59:20.000Z

Did you mean this?
type MyFixed is fixed;

No, I was talking about a user-defined fixed point type that is unrelated to the potentially built-in type fixed. The same example works if you imagine a user-defined floating point type type MyFloat is uint;

Answer 23 · 2021-08-31T08:01:25.000Z

Essentially, the problem is that the whole concept of "conversion" should be handled more thoroughly: In many cases, there is not just one way to convert from one type to another and the conversion function should be named properly depending on what it does and not just that it converts. The functions round(fixed) returns (uint), truncate(fixed) returns (uint) and unwrap(fixed) returns (uint) make it much clearer what happens than a generically named convert(fixed) returns (uint) or uint(fixed).

Answer 24 · 2021-08-31T08:26:31.000Z

I'm still not sold.

On the one line you're telling me A is B. And then below you're telling me it's not sufficient to convert from A to B without further specifying how.

I'm not going to quote Aristotle here, but that may be abusing the meaning of "is".

Answer 25 · 2021-08-31T08:34:17.000Z

I'm still not sold.

On the one line you're telling me A is B. And then below you're telling me it's not sufficient to convert from A to B without further specifying how.

I'm not going to quote Aristotle here, but that may be abusing the meaning of "is".

Sure, we can also discuss using a different keyword than is, but I think this is unrelated to the problem that the conversion between float / fixed and the underlying uint type is always ambiguous and maybe issue comments is not the right place for such a very interactive discussion.

Answer 26 · 2021-09-01T11:49:32.000Z

type MyType is uint; means that MyType is an abstraction of uint, not identical to it. If it were strictly identical, then it would have to be a mere alias and all operators would need to be inherited and MyType and uint would need to be substitutible everywhere (resp. implicitly convertible). That's clearly not what we want, so the is is clearly not a statement of identity (funnily that's exactly the flaw in Aristotle's conception of logic and language in general, but I won't go there :-)).

I'd be fine with changing from is to anything else, but as far as I'm concerned we also don't have to.

Technically, what type definitions like this mean is to introduce a new abstract type that has a given representation type. That's why I'm also used to the explicit conversion functions to be called abstraction (or short abs) and representation (or short rep), but I'd concede that - while it's very precise - it may seem weird without being used to it.

Answer 27 · 2021-09-01T15:04:22.000Z

Decision: type MyType is uint; is the syntax. No operators, even ==.

I'm still unsure about this. type MyType(uint); looked more natural to me, which is also what Rust uses for "newtype".

Answer 28 · 2021-09-01T15:34:35.000Z

Decision from design call: We do not allow any conversions, not even from or to the underlying type. This can be implemented by users using inline assembly. We might support it in the future.

Answer 29 · 2021-09-01T15:48:23.000Z

Decision from design call: We do not allow any conversions, not even from or to the underlying type. This can be implemented by users using inline assembly. We might support it in the future.

With the syntax type MyInt is uint256; it still suggests uint256 is a very special underlying type here, but after the above decisions it just means that is the storage type and the type used in the selector calculation for the ABI.

How about making this completely user defined:

type MyInt { storageType = uint256, signature = "uint256" };

(signature is optional and defaults to the signature of the storageType)

This borrows the "call modifier" syntax and the idea from #11819.

Answer 30 · 2021-09-01T16:37:59.000Z

I think this could be very confusing if you use a different storage type than a signature. I would opt for going with is for now, and if there is the need to define a custom signature, we can still extend it.

Answer 31 · 2021-09-01T16:48:57.000Z

Something has to determine the stack layout as well, for which we pretty much need an underlying type. (We have value types that use more than one stack slot and in the longer run I think there may be uses for tuples and stuff like that - although that'll need more thought). Although that line of thinking does raise some question about accessing these things in inline assembly, but we don't need to answer that now :-) (and can just disallow using function types as underlying types for now, probably not that useful at first anyways).

EDIT: or in short: I'd also say ìs with an underlying type is good for now.

Answer 32 · 2021-09-07T13:12:55.000Z

Decision from design call: We do not allow any conversions

Is assembly going to be required for defining the basic operations of these types? Normally one would allow conversion within a restricted scope and use that to define basic operations, and then disallow those conversions outside of that scope where only the public interface of the type can be used.

Answer 33 · 2021-09-07T13:32:10.000Z

@frangio We've currently decided to go with wrap and unwrap functions together with the type. For example, for type MyAddress is address, there will be the function MyAddress.wrap with signature address -> MyAddress and similarly, MyAddress.unwrap with signature MyAddress -> address.

Normally one would allow conversion within a restricted scope and use that to define basic operations, and then disallow those conversions outside of that scope where only the public interface of the type can be used.

This did come up in our discussion. The (quick) conclusion was to consider this at a later stage.