disco-lang/disco

Treating types more like sets

archonandromeda opened this issue · 2 comments

I have an idea for how to implement infinite sets and using the current builtin types as sets.

User-Defined Infinite Sets

An infinite set can be defined as one would expect...

-- the set of all even integers
E : Set(Z)
E = {2x | x in Z}

-- alternate definitions
E = {..-4, -2, 0, 2, 4..}
E = {x | x in Z, 2 // x}

Internally, Disco could store the qualifiers used to define E and use them to determine if a given value is an element of it.

For example...

even : Z -> Bool
even(n) = n elem E

would be functionally equivalent to...

even : Z -> Bool
even(2n) = true
even(_) = false

If the compiler is able to determine lower and upper bounds for the set, it may still become a finite set.

{x | x in Z, -3 < x < 3}
-- {-2, -1, 0, 1, 2}

{n | n in N, x <= 3}
-- {0, 1, 2, 3}

In theory, N/Nat/Natural could be redefined as {0, 1, 2..}, but it would probably be more performant to keep it as a builtin type.

Cardinality

Cardinality is difficult to program in a way that always works, but for simple cases, here are some ideas:

  • The magnitude of any builtin type |T| (except Bool) returns a new type Cardinality
  • A Cardinality value can be either countable or uncountable
    • Optionally, Disco could keep track of how uncountable a set is
    • For example, N * N is uncountable, but not as uncountable as N * N * N
  • All finite values are less than uncountable
  • countable < uncountable, and countable == countable, but uncountable values either cannot be compared or will make use of the "uncountability" above
  • |N| and |Z| are countable
  • |F| and |Q| are also countable
  • Similarly, any infinite set deriving from a countable set/type would also be countable
  • If a user-defined infinite set derives from multiple countable sets, it will become uncountable, optionally storing that "uncountability level" mentioned above
  • Bool could be treated as a finite set of {false, true}
    • |Bool| = 2
    • power(Bool) = {{}, {false}, {true}, {false, true}}
  • |Set(a)| is uncountable
  • Just for consistency, |A -> B| = uncountable and |Prop| = uncountable as well

Use as function domains?

This would probably be much harder to implement, but infinite sets could even be considered subtypes of the types they're sets of. This would allow them, for example, to be used as more restrictive domains for functions.

-- some function that can only work with even integers
foo : E -> N
-- translates to...
foo : Z -> N
foo(x) = userDefinedFoo(x)
-- if possible, the compiler or runtime will check that x satisfies the qualifier 2x

This is all based on ~half a freshman semester's worth of discrete math and I haven't really looked at the code for Disco, so that stuff may be completely infeasible or inaccurate. Take it with a grain of salt.

Thanks for the ideas! Some of this is indeed infeasible, but there are some good ideas in here too. This is something I've thought about before (#290 is a very similar issue with a different emphasis), but your examples illustrate both some additional opportunities as well as additional challenges I hadn't considered before.

-- the set of all even integers
E : Set(Z)
E = {2x | x in Z}

-- alternate definitions
E2 = {..-4, -2, 0, 2, 4..}
E3 = {x | x in Z, 2 divides x}

Your original definition of E is actually not as feasible as it looks: in particular, you write that n elem E could be evaluated as f(2n) = true; f(_) = false. That works fine in this example, but what about something like X = {2x^2 .- 3x | x in Z}? Writing f(2x^2 .- 3x) = true; f(_) = false is not valid syntax; although any expression may be used on the left-hand side of a set comprehension, only very limited sorts of expressions may be used as a pattern in a function definition. More generally, the problem is that although it is easy to compute the value of an expression given a value of x as input, it is much more difficult (even impossible, in general) to invert this, i.e. given the value of an expression, to decide whether there exists a value of x which will make it have that value.

However, your alternative definition which I have labelled E3 above (I assume you meant divides rather than //) would work: the definition literally says how to compute whether a given x is an element of the set or not. E2 could probably be made to work as a special case, by essentially turning it into something like E3. To summarize more concisely, sets defined by applying some function to each element of another set (like E) are difficult to work with, whereas sets defined by filtering another set via some predicate (like E3) are much more feasible to work with.

I think the biggest difficulty would be dealing with cardinality, and deciding whether a given set is finite or not. In general it may be impossible to tell. For example, consider the definition

import num
twinPrimes = { x | x in N, isPrime(x) /\ isPrime(x+2) }

(the isPrime function is available via the built-in num module). This is the set of all twin primes, and it is literally a big open question whether this set is finite or infinite. So if we allow defining infinite sets this way, we have literally no hope of being able to figure out whether a given set is finite or infinite. In turn this means that the cardinality operation |~| would become kind of useless.

Perhaps a way forward would be to have separate types for guaranteed-finite sets (like we have now) and for potentially-infinite sets.

  • Guaranteed-finite sets (Set(a)) would be represented internally as a list of elements; potentially-infinite sets (InfSet(a)?) would be represented as a predicate of type a -> Bool.
  • You can ask for the cardinality of a Set but not of an InfSet.
  • It is possible to convert in one direction via something like inf : Set(a) -> InfSet(a) (defined as basically inf(S) = \s. s elem S)
  • Union, intersection, difference would all work on both types of sets.
  • You could also take the intersection of a Set and an InfSet to get a Set.
  • Complement would work on InfSet but not on Set.
  • It might actually be tricky to figure out the precise restrictions needed to get set comprehension notation to work for InfSet. Anything that implicitly has an exists in it would not work. For example, {y | x in Z, y in Z, y == 2x^2 .- 3x} is basically the same as {2x^2 .- 3x | x in Z}.

Another related issue is #282 .