agda/agda-stdlib

List of sub-optimal definitions in `Data.List.Base`

JacquesCarette opened this issue · 28 comments

While looking at PR #2355 from @mildsunrise I noticed a ridiculous number of uses of with. These turn out to not be localized to Data.List.Properties but come from uses of with in Data.List.Base. The vast majority of them can be simplified, with a really nice knock-on effect to simplify quite a few proofs.

But I'm guessing this is yet another case of breaking change, so has to wait until 3.0? I think we should outlaw uses of with in function definitions, if favour of defining explicit 'helper' functions.


This is, I think, the standing criticism from @omelkonian and others (related to the various issue he has posted on the main issue tracker about inspect and our attempts to deprecate it...), but only indirectly alluded to in #1937 / #2123 ... although PRs associated with those have tackled some of the worst offenders (esp. wrt turning with into let, esp. in recursive calls).

In response to @MatthewDaggitt 's cautious nudge, @mildsunrise suggests in #2355 that the refactoring of catMaybe/mapMaybe that I suggested based on Orestis' prelude may in fact have the same behaviour as the existing definitions, so we should investigate that for all the examples you have identified... and if the computational behaviour is the same, then I think that shouldn't count as breaking...

@JacquesCarette Could you make a task list here on the issue of the 'worst'/most egregious examples? Then the analysis could be parallelised across us all... and we could document our arguments for the behaviour being preserved...?
UPDATED: on a PR-by-function basis

Yes if the computational behaviour is the same then it's backwards compatible.

As long as "forcing evaluation by with" and "forcing evaluation through pattern-matching" are 'the same computational behaviour', then yes. They sure have very different usability behaviour!!

Updated with an actual list.

Looking at the list, definitely a case of too much with on a Bool instead of if_then_else_ as in the first bullet of #1937 ... notwithstanding that proofs of properties of such functions may themselves require with to fix the types up properly in each case branch... (and I suppose at some point it might be relevant to document the list of "650--700" uses of with in stdlib I refer to there...)

See also @gallais 's comment there about if_then_else_ potentially being too eager to unfold...

initLast is an early example of using views to write functions... for which the use of with is often (but not always) a corollary side-effect... so I'd be tempted to go after the lower-hanging fruit first ;-)

Re: lower hanging fruit: clearly some uses of with (eg in merge) have already been 'deprecated' in favour of if_then_else_, so I suspect the others might also be 'harmless'.

i would love to see this happen. how would we name the new helper functions? do we have existing modules where we're using helper functions that serve as precedent? also, aren't there compatibility concerns from adding many new exports to a Base module?

Well, the way to replace with by an explicit helper function is already described in the paper which introduces the notation... but as to polluting the namespace, such things may always (?) be encapsulated as part of a local where block, so I think the knock-on effect on the API can be minimised?

The problem with local-only functions is that it's hard to prove things about them. Some of those helpers are independently useful, so those are 'easy', in a sense.

For many of the ones in Data.List.Base, the 'helpers' actually already exist - most of them are just maybe from Data.Maybe.Base !!

initLast is an early example of using views to write functions... for which the use of with is often (but not always) a corollary side-effect... so I'd be tempted to go after the lower-hanging fruit first ;-)

I'm starting to see that there are different kinds of views. At the very least:

  1. structurally obvious sub-computations (where you want to match on the results)
  2. computationally necessary sub-computations (on which you want to match)
  3. change of coordinates, then match

For 1, I'd argue that using views for defining a function (as opposed doing a proof) is an anti-pattern. For 2, like initLast, I am undecided. For 3, I think with makes quite a bit of sense.

Interesting analysis! (and some of the first since VfL to subject this monster to more 'scientific'/empirical scrutiny ;-))

'Anti-patterns in DTFP': a catchy, if niche, book title!?

Don't know if we could fill a book (yet), but certainly a non-trivial paper.

@JacquesCarette for how many of your shopping list do you have PRs ready-to-go? (I don't want to duplicate work... ;-))

Actually, while I did start working on removing some uses of with that seemed like overkill, I specifically avoided Data.List.Base so as to not clash with #2355 . But then I realized that it shouldn't make a difference: all those proofs should still go through as is (as that's what's happened with the others, where I'm likely post a PR soon). One could perhaps improve some of the proofs further (more of them could become equational), but that can wait.

So: go ahead.

What remain to do are:

  • unalignWith (needs a helper function, and then some properties about it commuting with various maps which had me tied up in knots)
  • initLast not sure you can replace with here, except cosmetically via case_returning_of_, but curious if you find an alternative...?
  • unsnoc ditto. but with case_of_

So I'll stop there!

Re: initLast and unsnoc.

unsnoc is for sure definable without with:

unsnoc []       = nothing
unsnoc (x ∷ xs) = just (maybe′ (λ (ys , y)  x ∷ ys , y) ([] , x) (unsnoc xs))

but you can't then define initLast in terms of it, because the InitLast view also encodes an equality which isn't (immediately) observable about unsnoc... so requires a separate proof (in Data.List.Properties?).
@JacquesCarette How might you seek to resolve that?

I've tried multiple things and: I can't definite initLast without using with. The InitLast view computes a lot, and that won't allow me to give any alternate definite (that I can see). I don't think the InitLast view is wrong in any way -- I think this is just inescapable for the change of view from cons to snoc.

Ob: initLast. Nowadays, we'd (more) typically name a function-symbol-quoted-as-a-constructor as <backtick><constructor-name> (I think I learnt this undocumented (?) syntax pattern from @gallais ), so here the view 'should' perhaps be written as:

data InitLast {A : Set a} : List A  Set a where
  `[]    : InitLast []
  _`∷ʳ_ : (xs : List A) (x : A)  InitLast (xs ∷ʳ x)

But not before the hypothetical-rewrite milestone, I suspect... ;-)

Similarly, the constructor (prec: 5) and the operator (prec: 6) are declared with different precedences!?

I've tried multiple things and: I can't definite initLast without using with. The InitLast view computes a lot, and that won't allow me to give any alternate definite (that I can see). I don't think the InitLast view is wrong in any way -- I think this is just inescapable for the change of view from cons to snoc.

I suppose... you could write

initLast′ : (xs : List A)  xs ≡ [] ⊎ ∃₂ λ ys y  xs ≡ ys ∷ʳ y
initLast′ []       = inj₁ refl
initLast′ (x ∷ xs) = Sum.[ lem₁ , lem₂ ]′ (initLast′ xs)
  where
  lem₁ : xs ≡ []  x ∷ xs ≡ [] ⊎ ∃₂ (λ ys y  x ∷ xs ≡ ys ∷ʳ y)
  lem₁ eq = inj₂ ([] , x , cong (x ∷_) eq)
  lem₂ : ∃₂ (λ ys y  xs ≡ ys ∷ʳ y)  x ∷ xs ≡ [] ⊎ ∃₂ (λ ys y  x ∷ xs ≡ ys ∷ʳ y)
  lem₂ (ys , y , eq) = inj₂ (x ∷ ys , y , cong (x ∷_) eq)

unsnoc′ : List A  Maybe (List A × A)
unsnoc′ xs = Sum.[ (λ _  nothing) , (λ (ys , y , _)  just (ys , y)) ]′ (initLast′ xs)

which is morally what already exists, but is with-free, albeit at a non-trivial cost of intelligibility... as well as the (losing the) payoff of view-based programming: why prove something (an equation) that the typechecker can do for you for free?

The problem with local-only functions is that it's hard to prove things about them.

You perhaps mean something else but if the issue is that you cannot refer to them, let
me remind you of named where blocks:

f : ty
f = g
  module F where
    g : ty

h : ty
h = F.g

Putting module F in front of the where suddenly gives you the ability to refer to these
local functions.

The problem with local-only functions is that it's hard to prove things about them.

You perhaps mean something else but if the issue is that you cannot refer to them, let me remind you of named where blocks:

f : ty
f = g
  module F where
    g : ty

h : ty
h = F.g

Putting module F in front of the where suddenly gives you the ability to refer to these local functions.

That is interesting, and something I hadn't known/appreciated before (though I think I'd seen #1166 discuss this feature without really getting what was going on) ... not sure about @JacquesCarette though!

Regarding "polluting the namespace with lots of module names" (#1166 ), can such declarations also be marked private? (Maybe that's moot if/while we retain a Base/Properties separation though!?)... Ah, OK documentation here

So to keep this thread alive until we can close it sooner rather than later: suggest

  • remove initLast and unsnoc from the list above
  • fix unalignWith

Then close?

Agreed - did the first bit, no time right now to do the second.

@mechvel 's issue #2415 should make us reconsider this issue and its associated PR #2365 but playing with @Taneb 's version of the regression tests suggests that we should reconsider the explicit/implicit quantification in the properties of each of the Decidable functions because eg the analogues of the filter-accept lemmas for partition yield unsolved metas regardless of whether the new or old definitions is used (UPDATED: see #2427)... while the derun properties in their existing form use explicit quantification, even though their proofs do not require it... so the overall picture seems highly... unclear (to me, at least).

so the overall picture seems highly... unclear (to me, at least).

Ditto.

Actually better I think to leave this open, with the filter etc. box left unticked... we should return to this in v2.2 once we've figured out #2427 ...