jj-vcs/jj

FR: Revset for divergent commits

Opened this issue · 9 comments

We apparently don't have a revset function for the set of divergent commits yet. We should add that.

This came up in #4965. Filing this to track it.

Related to the use-case in #4965, you could imagine wanting to clean up divergent commits by e.g. selecting whichever one has the most recent author/committer timestamp and abandoning the rest.

[currently] In the current revset language, it may be quite difficult to express that:

  • We don't have a data structure in the revset language (AFAIK?) to represent a result type of "a set of sets of commits", where each set corresponds to the commits for a given divergent change ID. (However, the template language might have lists/dicts?)
  • Even if we can represent the set of sets, we don't have a way to execute an operation on each set (find a representative element to preserve).
  • Even if we can execute an element on each operation of the set, we don't have a way to accumulate/fold/reduce the results into a single value.

[hypothetically] You can add more control structures and data types to the revset language to model imperative computations. I suspect the most natural approach would involve adding lambdas to the language, at least.

[hypothetically] A logic programming or relational model might be able to express the above more straightforwardly than the current revset language seems to naturally support. For example, you could express this:

$$ \begin{align} \text{commitsdiverge}(x, y) &= x \neq y \land x\text{.changeid} = y\text{.changeid} \\ \text{divergent}() &= \left\lbrace x : \text{Commit} | \exists y : \text{Commit} ~ \left[ \text{commitsdiverge}(x, y) \right] \right\rbrace \\ \text{latestdivergent}(D) &= \left\lbrace x \in D | \forall y \in D ~ \left[ \text{commitsdiverge}(x, y) \rightarrow x\text{.date} > y\text{.date} \right] \right\rbrace \end{align} $$

and write jj abandon -r 'latestdivergent(divergent())'. It's interesting to note that we've added richer "control structures" (quantifiers) to the language, without actually adding richer data structures, unlike the natural imperative extensions to the revset language. I would vaguely be in favor of switching to a logic programming paradigm, but there are significant trade-offs as well, and it's obviously that's a larger discussion that goes beyond the scope of this issue.

[action item] We should catalogue any more use-cases that aren't served well by the current revset model, which might help us decide if we want to switch paradigms for the revset language. (Unless it turns out you can handle the above case directly, in which case ignore this and let me know how 😅.)

  • Even if we can represent the set of sets, we don't have a way to execute an operation on each set (find a representative element to preserve).

  • Even if we can execute an element on each operation of the set, we don't have a way to accumulate/fold/reduce the results into a single value.

Isn't this what jj run is?

Also, there's a feature request for "Make jj commands more composable"

Isn't this what jj run is?

In the current design of jj run:

  • It accepts only a single revset, rather than a set of revsets or an alternative data structure, so it wouldn't be able to accept a set of revsets with each individual revset corresponding to a given divergent change ID.
  • It can only map one input commit to one output value, without taking into consideration any other input commit, so you can't e.g. find the maximum commit in an input set according to an ordering predicate.

So the current jj run design doesn't provide any new functionality compared to the revset or templating languages in terms of the input or output structure. Rather, the main feature is doing the processing via arbitrary commands (possibly in the working copy, with side-effects, etc.), than just by processing the commit metadata and graph topology in-memory with the existing languages (which are limited and side-effect-free).

Conceivably, someone could design jj run in a way to handle the data structure and arity problems, though. I could imagine a design where jj run is also used in workflows for processing the commit graph in ways that are more powerful than the revset language provides, without specifically relying on side-effectful commands.

yuja commented

jj abandon -r 'latestdivergent(divergent())'

Maybe I missed the point, but with the current revset language, this query might be expressed as divergent(latest()). Predicates like bookmarks().filter(pat) are usually put into the argument list.

Maybe I missed the point, but with the current revset language, this query might be expressed as divergent(latest()). Predicates like bookmarks().filter(pat) are usually put into the argument list.

I believe that's something different. Suppose you have X1, X2, Y3, Y4, with X/Y being the divergent change IDs and X1 having been committed before X2, etc., and you want to automatically abandon X1/Y3 in favor of the newer X2/Y4, then I don't think there's currently a way to express that. You first have to group by change ID (like {X1, X2}, {Y3, Y4}), then select the latest element from each of those groups.

In my case, I want to select only {X2, Y4} (or the complement, doesn't matter). In your case, I think it first determines {Y4} as the latest commit(s), then selects Y4 from it as a divergent commit, which doesn't help you select only the latest version of each divergent change on a per-change basis.

yuja commented

I mean we can set arbitrary domain within function arguments. For example, divergent(expr) can be defined as a function that selects divergent commits based on expr where expr is a DSL similar to revset, but be evaluated for each divergent pair. This is basically the same as bookmarks(pat). The pat is evaluated for each bookmark, not revision.

For example, divergent(expr) can be defined as a function that selects divergent commits based on expr where expr is a DSL similar to revset, but be evaluated for each divergent pair.

I'm skeptical that we want to invent a DSL for binary relations to represent the pairwise reduction just for divergent. The text pattern DSL seems more permissible because it's (presumably) applicable to many revset functions, and is a relatively simple unary predicate. But I'm open to hearing if there's an intuitive DSL to handle this case generally.

yuja commented

I'm skeptical that we want to invent a DSL for binary relations to represent the pairwise reduction just for divergent.

I agree about that. It doesn't have to be a binary relation, but is still slightly different than revset expression (because the latest(x) revset cannot be expressed well in divergent(latest(???)), and divergent(::???) doesn't make sense.)

My feeling is that latestdivergent(divergent()) has the same issue. latestdivergent(D) wouldn't be generally useful.

My feeling is that latestdivergent(divergent()) has the same issue. latestdivergent(D) wouldn't be generally useful.

Sorry, I think I was unclear in what I was proposing. I agree that latestdivergent isn't generally useful. Rather than proposing implementing the functions on the left-hand side of the =s, what I meant to propose was adding support for the logical operators on the right-hand side of all the =s ($\forall, \exists$, set-builder notation).

The specific example demonstrates a way that we can solve this class of problem with logic programming as a potential way to extend the language, as we don't have language features to solve the problem at present. Of course, it's possible to take many other approaches (imperative, functional, relational, etc.).

I can make a separate issue to track it (EDIT: posted #4980). It came to mind for this issue because I thought to myself that I often want to operate on divergent commits in the way I mentioned in my original comment, but I can't do it with the current revset language.