AndrasKovacs/elaboration-zoo

Zoo zone for introducing inductive data types?

anqurvanillapy opened this issue · 8 comments

I think the introduction of inductive data types would be great, but it may bring in more problems due to recursion/induction:

  • How to do (dependent) pattern matching (this is so big)? Or just use a generated recursor/induction principle like Lean's T.rec_on for simple structural recursion?
  • What about well-founded recursion? Hmmm
  • Termination check if general recursion is not prohibited?

Any plans for examples of this kind? BTW, ❤️ this repo so much for offering great examples of making elaborators!

Some pros and cons:

Eliminators (induction principles).

  • Formally nice.
  • Relatively simple to implement.
  • Can be inconvenient. E.g. injectivity and disjointness of constructors is not built-in and have to be proven and explicitly referred to. This leads to some size increase of proof terms. More complicated induction, e.g. lexicographic, requires more setup than e.g. in Agda where it often works automatically. In setoid/observational TT, constructor disjointness/injectivity can be made to hold by definition, hence in that setting eliminators can be significantly more ergonomic than in MLTT:
  • Nested inductive types such as data Tree a = Node a [Tree a] are not nicely supported in any existing systems which generate eliminators. It's possible to do better but here we can't just copy existing solutions.
  • Requires more care with unfolding and displaying definitions, since we usually don't want to flood users with eliminators.

Pattern matching + termination checking

  • Formally ugly. Conversion checking and evaluation is a bit more complicated.
  • Requires termination checking if we want logical consistency.
  • There's a wide range of complexity and sophistication when it comes to pattern matching. The simplest is miniTT's dependent case split. The strongest is Agda's matching. Something like a mini-TT case split without termination checking is quite simple to implement, while the Agda solution requires far more effort.

It would be good to implement implement several versions of inductive types here. Plain inductive families a la Dybjer is a good choice. Ideally, both the eliminator version and the mini-TT case split version would be included, with perhaps termination checking for the latter.

The existing code is actually rather unsatisfactory in some places (see the repo issues), so I'd like to revise them first. After that, I might get to inductive types.

minad commented

@AndrasKovacs It seems that the major difficulty is implementing inductive families. If those are not supported, eliminators and case-split are equally simple to implement. As far as I see, the miniTT you mentioned does not to support inductive families.

In order to support inductive families you either add first-order unification (solution, conflict, cycle, deletion rule) with rewrites/substitution or you internalise the index unification, cluttering everything with coercions and no-confusion proofs.

In Coq/Lean they internalise the unification, while in Agda everything is in the typechecker. There are also possibilities which are half-way between eliminators and full Agda-style pattern-matching, e.g., explicit case trees with unification of indices. There is a proposal by Barras et al. (A New Elimination Rule for the Calculus of Inductive Constructions), where some substitutions are applied to the result type.

Maybe since you already have substitutions in the evaluator in order to support metavariables/elaboration, also adding first-order unification with substitution does not seem like a big concern? But overall I am quite unsure about the "best way" to implement inductive families. It would be great to hear your thoughts!

To me, the distinguishing feature of miniTT case splits is the lack of nesting and argument permutation, when compared to Agda patterns. So I view it as a minimal version of pattern matching. I don't think it's difficult to add index unification on top of simple case splits, especially if we don't care about without-K. So case split + inductive families is a good demo implementation. I would actually just skip non-indexed inductive types in this repository, and go for inductive families.

I think that if we only want to support inductive families but not more general inductive types, then Agda-style pattern matching is convenient and efficient (in terms of core representation and evaluation), so I'd vote for that.

On the other hand, with more general inductive types and perhaps more support for metaprogramming with algebraic theories and signatures, I would instead choose eliminators, because I'd want to work on as few new and complicated features as possible. I think that no-confusion bloat can be handled just fine, in terms of performance, in a reasonably efficient implementation (which Coq & Agda aren't). Moreover, in setoid TT no-confusion holds by definition, so there's no bloat. So in my current project about setoid TT implementation I'm planning to use eliminators.

minad commented

So I view it as a minimal version of pattern matching. I don't think it's difficult to add index unification on top of simple case splits, especially if we don't care about without-K.

Yes, this works. I implemented something like that in an NbE typechecker, which is similar to yours and @ollef's. But I am also interested in full pattern matching on the frontend. And I am not convinced that full Agda-style pattern matches should be part of the core representation.

I am wondering what a good core representation and target language for the elaboration procedure should look like, if the goal is to have a core language which can possibly be checked by an independent typechecker. Ideally this TC should have a significantly lower complexity than the elaborator.

  1. pattern matching lambdas (too complicated?)
  2. case-splits with index-unification (sufficiently readable core, still needs unification and substitution, maybe not low-level enough)
  3. eliminators (bloated core, no unification, simple)

In particular the unification aspect worries me, since it is an additional mechanism needed purely to support inductive families, while it would not be needed for anything else during the type checking. On the other hand, the eliminator approach is also not perfectly satisfying since everything is cluttered by "meaningless terms".

I think that no-confusion bloat can be handled just fine, in terms of performance, in a reasonably efficient implementation (which Coq & Agda aren't).

I wonder about that remark - the possible efficiency gains you are talking about are mostly due to avoiding unfolding everything too eagerly? Or is there something else which Coq and Agda are doing wrong?

Moreover, in setoid TT no-confusion holds by definition, so there's no bloat.

Interesting, how does that work? Is there a good reference you can recommend regarding setoid tt?

pattern matching lambdas (too complicated?)

MiniTT's splitting is via pattern matching lambdas. They're simpler IMO than Coq-style matching, where the matching is not tied to a lambda.

the eliminator approach is also not perfectly satisfying since everything is cluttered by "meaningless terms".

I wouldn't view any of that as meaningless. No-confusion may appear to be clutter in a simple setting, but when we go to e.g. cubical Agda, no-confusion can be false for some (higher) inductive types, and essential for transport computation in others.

Interesting, how does that work? Is there a good reference you can recommend regarding setoid tt?

For any inductive type constructor without quotients, we can define the equality type recursively on constructors, computing automatically along constructor injectivity and disjointness. For example

(nil       = nil)       ≡ ⊤
(cons x xs = cons y ys) ≡ (x = y) ∧ (xs = ys)
(cons _ _  = nil)       ≡ ⊥
(nil       = cons _ _)  ≡ ⊥

For quotient inductive types, no-confusion might be false, so there we only assume that equality is a congruence on constructors.

Here's a quick overview. The setoid TT paper is nice, but its syntax is practically rather unwieldy. My implementations differ significantly, and have some features which aren't yet covered by any rigorous metatheoretic work that I know of.

the possible efficiency gains you are talking about are mostly due to avoiding unfolding everything too eagerly? Or is there something else which Coq and Agda are doing wrong?

Unfolding is important, but there are plenty of performance issues which aren't directly related. You can see some Coq examples
here, the recent issues by JasonGross are especially spectacular. Many problems are quite fundamental and hard to correct. For example Agda has to shift the entire preterm body when going under a binder during elaboration, because of its basic setup. Also, Agda doesn't use de Bruijn levels, using only indices, which makes weakening-free evaluation impossible.

minad commented

MiniTT's splitting is via pattern matching lambdas. They're simpler IMO than Coq-style matching, where the matching is not tied to a lambda.

Regarding pattern matching - I am confused. How do you define pattern matching? For me pattern matching is one or more arguments being matched on, potentially nested. While case-splits are single-constructor non-nested matches. In the core rep I would only want either these simple case splits or eliminators. Therefore I am not classifying minitt as pattern-matching.

I wouldn't view any of that as meaningless. No-confusion may appear to be clutter in a simple setting, but when we go to e.g. cubical Agda, no-confusion can be false for some (higher) inductive types, and essential for transport computation in others.

Agree. They are obviously not meaningless, otherwise they wouldn't be needed - I mean they can be quite easily constructed and are usually not useful to be shown to the user, at least in the simple setting of inductive families. I see this somehow has an abstraction leaking problem, where the underlying type theory leaks through while the user is offered this nice high-level pattern matching. On the other hand it is quite instructive if it is possible to look at every level of a construction, which is not possible by externalising the theory and baking-in everything in the typechecker.

You can see some Coq examples here, the recent issues by JasonGross are especially spectacular.
Many problems are quite fundamental and hard to correct.

Ouch, there are quite many. These issues are really nice with the superlinear graphs all over the place :) But I don't have enough insight to tell if there is some fundamental issue going on here or if this is just a suboptimal isolated algorithm at work which can be replaced easily by something better.

For example Agda has to shift the entire preterm body when going under a binder during elaboration, because of its basic setup. Also, Agda doesn't use de Bruijn levels, using only indices, which makes weakening-free evaluation impossible.

Interesting what is the reason for doing that? While walking down during elaboration or checking, it seems quite natural to use levels. It seems to me that Agda is just trying to many things at once all in one big typechecker. But once again I don't have the insight here. I've only looked briefly at the agda typechecker a few times but never came very far.

Thank you for the explanations/references regarding setoidtt - I will take a look! I have to think how pattern matching can be compiled down there - but basically you are saying that a separate no-confusion construction is unnecessary, since equality already gives you that?

How do you define pattern matching?

I tend to use "pattern matching" for anything which is not eliminators.

some fundamental issue going on here

As an example, this issue is simple and clearly indicates a fundamental issue.

It seems to me that Agda is just trying to many things at once all in one big typechecker

I don't think that's the case; I would actually even compress the pipeline a bit more, as Agda uses a separate scope-aware pass for operator parsing, and I'd instead do scope-sensitive parsing in one go. The design choice to only use indices is probably as old as Agda 2. Back then the NbE elaboration approach was extremely obscure, and probably at the time it hadn't been implemented anywhere.

separate no-confusion construction is unnecessary, since equality already gives you that?

That's correct. I admit though that this only eliminates the no-confusion proof terms, but the general boilerplate remains. E.g. if I want to write head for Vec, I still need to use a ⊥-elim where otherwise I'd need no-confusion.

VecElim : ∀{A}(B : ∀ {n} → Vec A n → Set)
          → B nil
          → (∀ {n} a as → B {n} as → B (cons a as))
          → ∀ {n} as → B {n} as

head : ∀ {A n} → Vec A (suc n) → A
head {A}{n} as =
  VecElim (λ {n} _ → ∀ m → n ≡ suc m → A)
          (λ m eq → ⊥-elim eq)    -- I'd need 0≢suc here
          (λ a as _ _ _ → a)
          as n refl
minad commented

I don't think that's the case; I would actually even compress the pipeline a bit more, as Agda uses a separate scope-aware pass for operator parsing, and I'd instead do scope-sensitive parsing in one go. The design choice to only use indices is probably as old as Agda 2. Back then the NbE elaboration approach was extremely obscure, and probably at the time it hadn't been implemented anywhere.

Well, that's only parsing. Probably there things can be compressed, but in my opinion splitting the pipeline is not necessarily a reason for a slow compiler (see for example chez nanopass). I think haskell is simply dropping too much performance on the floor during basic operations. Then many of the standard libraries are not fast. Take for example serialization, with binary and cereal. I forked persist from store to fix that for me, but it is probably still not as good as it should be - in particular regarding deduplication, which is not available.

Generally, about doing many things at once - this makes me concerned about the consistency of the core. And it seems to me that the Agda people are also looking into minimizing the core theory somehow - at least I've read in some false issues that it would be nice to have a smaller trusted kernel.

I still need to use a ⊥-elim where otherwise I'd need no-confusion.

Yes, I expected that - therefore I asked. It is still nice to have no additional constructions. But even if in cic, where one needs no-confusion terms, they are not that huge. And when showing the eliminators, they could somehow be compressed, e.g., hiding the coercions and impossible branches.