User-defined value types
chriseth opened this issue ยท 33 comments
Originally proposed by @maurelian
It should be possible to define a user-defined type that is identical to a provided value type (including operators, members, etc.), but cannot be implicitly converted to any other type, and can only be explicitly converted to and from the underlying value type.
This has the effect that arithmetic operations are impossible to perform across these types.
Example (syntax under discussion):
typedef DistanceInMeters uint;
typedef DistanceInInch uint;
typedef Price fixed128x10;
DistanceInMeters distanceToDestination;
Price buyerPrice;
Price sellerPrice;
In the above, you can do buyerPrice - sellerPrice
, but you cannot do buyerPrice * distanceToDestination
.
Currently, there is a hack to define user defined address types using empty interfaces. (Saw this in a discussion from eth security.)
interface MyAddress {}
function f(address a) returns (address) {
MyAddress myAddress = MyAddress(a);
return address(myAddress);
}
I think user defined types is a good idea. But I think the rules "cannot be implicitly converted" and "automatically derives arithmetic operations" are not consistent.
To take an example from Haskell, type is an alias and can be interchangeably used with the original type. In particular, implicit conversions and operators just work. Whereas newtype creates a new type that has stricter type requirements and can't just be converted to the underlying type. To define operations on it, one has to explicitly declare it.
Example code from Haskell's user defined types
-- Can be interchangeably used with Int
type MyInt = Int
f :: Int -> Int
f x = x + 1
-- Implicit converstion: type checks
a :: MyInt
a = 5
-- Addition and implicit conversion: type checks
b :: MyInt
b = f a
-- Creates a new type, but implicit conversions are not possible
newtype MyNewInt = MyNewInt Int
-- Implicit conversion doesn't type check
-- c :: MyNewInt
-- c = 5
c :: MyNewInt
c = MyNewInt 5
-- Doesn't type check
-- d :: Int
-- d = f c
-- Doesn't type check, since + is not defined
-- d :: MyNewInt
-- d = c + 5
newtype MyNewIntWithPlus = MyNewIntWithPlus Int
instance Num MyNewIntWithPlus where
MyNewIntWithPlus a + MyNewIntWithPlus b = MyNewIntWithPlus (a + b)
MyNewIntWithPlus _ * MyNewIntWithPlus _ = undefined
signum = undefined
abs = undefined
fromInteger = undefined
negate = undefined
d :: MyNewIntWithPlus
d = MyNewIntWithPlus 10
-- Now addition type checks.
e :: MyNewIntWithPlus
e = d + d
main = undefined
I think we should have an explicit syntax that shows that the operators are derived, and by default, none of the operators should be derived.
The syntax could look like
newtype MyInt = uint256 deriving (+, -, *, ==);
using MyInt = uint256 inheriting (+, ==);
As a side comment, I think the operator <
for MyAddress
in the first example code should be disallowed.
Thanks for opening this for me @chriseth, I swear it was still on my todo list. :)
Currently, there is a hack to define user defined address types using empty interfaces. (Saw this in a discussion from eth security.)
I think I might have started that conversation. The original context at Optimism is that we would like to differentiate between addresses on L1 and L2, and prevent them from being used interchangeably.
typedef ethAddress address;
typedef ovmAddress address;
function(ovmAddress oAddr, bytes data) {
oAddr.call(data); // typeEror
}
I also generally like the sound of what @hrkrshnn proposes for valid operators.
On second thought, I'm not sure if we should allow arithmetic operators for user defined types.
The main problem is whether inbuilt safemath should apply for such types, and if yes, should unchecked
blocks do wrapping arithmetic?
Perhaps the only operator that a user defined type should have is ==
and !=
. (Even better would be making this explicit)
I would be okay with just disallowing ==
as well. We currently don't have ==
for tuples either (see #8919). Perhaps, we can think about defining operators on all such types, tuples and structs as a separate issue afterwards.
Example on how equality is handled in Haskell's newtype
newtype MyInt = MyInt Int
a :: MyInt
a = MyInt 10
a' :: MyInt
a' = MyInt 11
-- Doesn't type check, since == is not defined
-- No instance for (Eq MyInt) arising from a use of โ==โ
-- b :: Bool
-- b = a == a'
newtype MyIntWithEq = MyIntWithEq Int
deriving(Eq)
c :: MyIntWithEq
c = MyIntWithEq 10
c' :: MyIntWithEq
c' = MyIntWithEq 20
b :: Bool
b = c == c'
b' :: Bool
b' = c /= c'
main = undefined
A user can define functions to do arithmetic operations on it anyway, where the overflow behaviour is explicit.
newtype MyInt = uint256;
function add(MyInt a, MyInt b) returns (Myint c) {
c = MyInt(uint256(a) + uint256(b));
}
function unchecked_add(MyInt a, Myint b) returns (Myint c) {
unchecked {
c = MyInt(uint256(a) + uint256(b));
}
}
If we want to explicitly list operators, I see this feature very much in line with the "using A for x" statement:
type NewType = uint using (+, g, f, L);
which means that for
NewType a;
NewType b;
you can do
a + b;
a.g();
a.f();
a.t(); // with t being a function of L
Alternatives for the syntax:
type NewType = uint: +, -, f;
type NewType = uint with +, -, f;
type NewType: uint using (+, -, f);
type NewType: uint (+, -, f);
type NewType(uint) using +, -, f;
type NewType is uint using +, -, f;
@leonardoalt mentioned that he would like to see something like Rust's trait system in Solidity. I think we can extend the current notation to traits in the future.
Alternatives for the syntax:
How about reusing the syntax for libraries?
type NewType = uint;
using {+, g, f, L} for NewType;
It would also allow multiple libs to define stuff for the same type.
Separating the using
and type
declarations as in the previous comment sounds like it would allow users of a type to define more arithmetic operations for it, breaking the encapsulation benefits. I may want to define a type and disallow addition by not deriving the +
operator, and I wouldn't want users to be able to add it themselves. Though I guess they'll always be able to cast back to the original type... so it's not full encapsulation, but at least the casting makes it explicit that you're breaking out of the type rules.
About the using syntax:
type NewType = uint;
using {+, g, f, L} for NewType;
The problem I have with this syntax is where is +
coming from? We haven't defined a +
with signature (NewType, NewType) -> NewType
. (I'm assuming that in the above example, g
and f
are functions with signature (NewType, ...) -> ...
and not (uint, ...) -> ...
; And the binding to the latter signature should be a type error.)
The using syntax looks fine, if we want to bind a function with type (NewType, ...) -> ...
to NewType
and call it by NewType(value).f(...)
, but if we want to copy an operator from the uint
to NewType
, it has to be done in the definition.
The intuition for a syntax like
type NewType = uint deriving (==);
would be that there is a typeclass (as in Haskell) or a trait (as in rust) with the name ==
. And the language already implements an instance of ==
on uint
. When a new type is constructed, we derive the implementation of the typeclass or trait from the base type.
It's slightly weird to have typeclass / traits with the name of the operator. In haskell and rust, ==
would be Eq
. For +
, it's Additive
in haskell (or part of Num
, which also includes *
, negate
, etc.) and Add
in rust.
If we want to really go in this direction, we could consider defining proper typeclasses or traits with standard names, such as Eq
, Add
etc. And the following syntax would actually make perfect sense:
type newType = uint deriving (Eq, Add);
The keyword derive
is also present in rust, which does exactly as above. See https://doc.rust-lang.org/rust-by-example/trait/derive.html
I see your point about +
only being defined for uint
, but I don't see why introducing a trait / typeclass would fix the issue. You would still need to have the exception that the compiler defines operators for you.
Another point about the syntax: I think we should not use =
because it looks too much like a type alias. What about type newType is uint;
or type newType: uint;
?
The easiest solution (i.e. let's implement this now) is to disable operators and disable anything implicit.
If you really need your custom addition function divMod(x, y)
to apply to a new integer type (which is exactly just uint256
anyway) then you can rewrite the function.
Requiring explicit casting is a huge safety feature. We can all name languages that use typecasting for safety, mine is Ada.
Proposal:
type L1Address is address;
type L2Address is address;
L1Address base = L1Address(msg.sender);
base = msg.sender; // ERROR
L2Address rollup = L2Address(msg.sender);
base = rollup; // ERROR
type NFTBatchID is uint32;
NFTBatchID batchId = 16; // ERROR
batchId = NFTBatchID(16);
function getBatchFromToken(NFTBatchID batchID); // ABI is getBatchFromToken(uint32)
Yep, I agree, let's do it without operators for now and check for feedback.
Yep, I agree, let's do it without operators for now and check for feedback.
To be clear, this would mean no operators, in particular no ==
, right?
Yes, nothing at all.
Decision: type MyType is uint;
is the syntax. No operators, even ==
.
I think it would be better to have type MyType is uint;
introduce explicit conversion functions as members of MyType.
rather than allowing normal explicit conversions from and to the underlying type.
I think it would be better to have
type MyType is uint;
introduce explicit conversion functions as members ofMyType.
rather than allowing normal explicit conversions from and to the underlying type.
Just to name a few reasons for this:
- Those "conversions" are isomorphisms in the type system you will regularly need for composing generics, so it is important that these functions can be named, if we don't want to have boilerplate of functions calling a single explicit conversion everywhere.
- Conversion is a special syntactic concept that has semantic meaning that can easily differ from plainly accessing a representation. Converting to
uint
for fixed points or floating points, etc., is not the same as retrieving the representation type. So explicit conversions are much closer to an operator than to a way to access representation types or construct abstract types. - While we can easily extend this system to allow defining conversion operators, there is no way to define other differing conversion functions or disallow conversions, if this is the sole atomic way to convert between these types.
- Adding members of the newly defined type is not conflicting or breaking in any way, so there is little headache about how to do this.
While I agree with most of what you say, I think it is important that code implementing functions / operators for user-defined value types is still concise. Otherwise, we could also tackle this problem together with a generic overhaul of the conversion system. The point about functions being nameable can be fixed by also providing such a conversion function on the type or just change it once we have generics.
Are there any suggestions with regards to the syntax?
MyType.fromUnderlying(x);
MyType.toUnderlying(x);
MyType.construct(x);
MyType.destruct(x);
MyType.create(x);
MyType.access(x);
MyType.wrap(x);
MyType.unwrap(x);
Or maybe this is a situation where it is OK to use abbreviations?
type BatchIdentifier is uint32;
I don't see why converting a uint32
to/from a BatchIdentifier
needs to be more complicated than converting a uint32
to/from a uint256
.
Therefore, if uint32 hi = 16; uint256(hi);
is acceptable then so should be uint32 batch 15; BatchIdentifier(batch);
I don't see why converting a
uint32
to/from aBatchIdentifier
needs to be more complicated than converting auint32
to/from auint256
.
For most cases, uint256(x)
and BatchIdentifier(y)
are sufficient, but there are some situations where this is sub-optimal. For example, if you introduce a fixed-point type
type MyFixed is uint;
Then MyFixed x = ...; uint y = uint(x);
could create the impression that y
is the value of x
rounded to an integer, which it is not.
This
type NotAUint is uint;
creates the impression the NotAUint is not a uint. We cannot stop that.
Did you mean this?
type MyFixed is fixed;
Then in that case you should have:
MyFixed x = ...;
uint y = uint(x); // COMPILER ERROR cannot convert from fixed to uint
Did you mean this?
type MyFixed is fixed;
No, I was talking about a user-defined fixed point type that is unrelated to the potentially built-in type fixed
. The same example works if you imagine a user-defined floating point type type MyFloat is uint;
Essentially, the problem is that the whole concept of "conversion" should be handled more thoroughly: In many cases, there is not just one way to convert from one type to another and the conversion function should be named properly depending on what it does and not just that it converts. The functions round(fixed) returns (uint)
, truncate(fixed) returns (uint)
and unwrap(fixed) returns (uint)
make it much clearer what happens than a generically named convert(fixed) returns (uint)
or uint(fixed)
.
I'm still not sold.
On the one line you're telling me A is
B. And then below you're telling me it's not sufficient to convert from A to B without further specifying how.
I'm not going to quote Aristotle here, but that may be abusing the meaning of "is".
I'm still not sold.
On the one line you're telling me A
is
B. And then below you're telling me it's not sufficient to convert from A to B without further specifying how.I'm not going to quote Aristotle here, but that may be abusing the meaning of "is".
Sure, we can also discuss using a different keyword than is
, but I think this is unrelated to the problem that the conversion between float
/ fixed
and the underlying uint
type is always ambiguous and maybe issue comments is not the right place for such a very interactive discussion.
type MyType is uint;
means that MyType
is an abstraction of uint
, not identical to it. If it were strictly identical, then it would have to be a mere alias and all operators would need to be inherited and MyType
and uint
would need to be substitutible everywhere (resp. implicitly convertible). That's clearly not what we want, so the is
is clearly not a statement of identity (funnily that's exactly the flaw in Aristotle's conception of logic and language in general, but I won't go there :-)).
I'd be fine with changing from is
to anything else, but as far as I'm concerned we also don't have to.
Technically, what type definitions like this mean is to introduce a new abstract type that has a given representation type. That's why I'm also used to the explicit conversion functions to be called abstraction (or short abs
) and representation (or short rep
), but I'd concede that - while it's very precise - it may seem weird without being used to it.
Decision: type MyType is uint; is the syntax. No operators, even ==.
I'm still unsure about this. type MyType(uint);
looked more natural to me, which is also what Rust uses for "newtype".
Decision from design call: We do not allow any conversions, not even from or to the underlying type. This can be implemented by users using inline assembly. We might support it in the future.
Decision from design call: We do not allow any conversions, not even from or to the underlying type. This can be implemented by users using inline assembly. We might support it in the future.
With the syntax type MyInt is uint256;
it still suggests uint256
is a very special underlying type here, but after the above decisions it just means that is the storage type and the type used in the selector calculation for the ABI.
How about making this completely user defined:
type MyInt { storageType = uint256, signature = "uint256" };
(signature
is optional and defaults to the signature of the storageType
)
This borrows the "call modifier" syntax and the idea from #11819.
I think this could be very confusing if you use a different storage type than a signature. I would opt for going with is
for now, and if there is the need to define a custom signature, we can still extend it.
Something has to determine the stack layout as well, for which we pretty much need an underlying type. (We have value types that use more than one stack slot and in the longer run I think there may be uses for tuples and stuff like that - although that'll need more thought). Although that line of thinking does raise some question about accessing these things in inline assembly, but we don't need to answer that now :-) (and can just disallow using function types as underlying types for now, probably not that useful at first anyways).
EDIT: or in short: I'd also say รฌs
with an underlying type is good for now.
Decision from design call: We do not allow any conversions
Is assembly going to be required for defining the basic operations of these types? Normally one would allow conversion within a restricted scope and use that to define basic operations, and then disallow those conversions outside of that scope where only the public interface of the type can be used.
@frangio We've currently decided to go with wrap
and unwrap
functions together with the type. For example, for type MyAddress is address
, there will be the function MyAddress.wrap
with signature address -> MyAddress
and similarly, MyAddress.unwrap
with signature MyAddress -> address
.
Normally one would allow conversion within a restricted scope and use that to define basic operations, and then disallow those conversions outside of that scope where only the public interface of the type can be used.
This did come up in our discussion. The (quick) conclusion was to consider this at a later stage.