lichess-org/chess-openings

Ensure openings follow parent line

allanjoseph98 opened this issue · 5 comments

The DB already follows the format Opening family: variation1, var2, var3... It follows that French Defense: Advance Variation derives from French Defence. We should also ensure that the corresponding pgns also follow this logic: 1. e4 e6 2. d4 d5 3. e5 should be derived from 1. e4 e6 2. d4 d5 and not 1. d4 e6 2. e4 d5. This is already the case for most openings. However, there are still a lot of bad transpositions.

To fix them we should:

  1. Ensure every Opening family has a unique shortest pgn #55 - WIP
  2. Use the fixed families to fix bad var1's, use fixed var1's to fix var2's etc.
  3. If name ends in "...Gambit", ensure "...Gambit Accepted" and "...Gambit Declined" are children of "Gambit"

Roadblocks to the above:
I ran into some problems related to two major opening families for black: The Modern and the Pterodactyl Defense. Study here - https://lichess.org/study/WrOHG3qw

These are well-established openings (40 lines+34 lines in the db) that have multiple sources corroborating that they both start with 1.e4 as well as 1. d4. Often times lines will transpose into each other. The seemingly only way to tell a 1.e4 Modern apart from a 1. d4 line is by its ECO.

I personally do not think that ECO is enough to know the move order. Almost every other opening's move order can be inferred from its name. Not to mention, it does not help when trying to group openings by family, for example for the lichess opening pages that were in development.

In the study, I've outlined a possible solution but it would require breaking precedent from all other chess sites. Even then, it is not perfect. Would really love some feedback/ideas. Or whether it's best to just leave the ambiguity.

Rules 1-3 make a lot of sense to me, and they could be added to the automated linter.

Considering an alternative to the proposals ... if we give up on sorting and grouping by ECO, what other ordering seems best, and how should be split the files? Order by moves, and split by first move?

Ordering by moves seems to be the most logical. I'd be wary about trying to add too much logic to it and recreating https://xkcd.com/927/.

Going by first moves alone, there'd be

  1. e4 1888 openings
  2. d4 1078 openings
  3. __ Remaining 426 openings

Ordered by pgn,name and eco (for those who want it). It should be much more readable/accessible especially once bad transpositions are fixed.

Revisiting this and having some doubts ...

Regarding ECO: I was hoping we could remove it entirely, and asked around a bit. For all its flaws, people are still using it to reference other material, so it needs to stay 😞

Regarding 2: I am no longer sure that it's really the case that child variations by name are necessarily child variations by moves (in the canonical or most common move order).

Regarding 3: To clarify, how would we deal with something like the following?

c.tsv:C40       Latvian Gambit: Mason Countergambit     1. e4 e5 2. Nf3 f5 3. d4

Rule 2:

  1. Take for example:
A56 | Benoni Defense | 1. d4 Nf6 2. c4 c5
A43 | Benoni Defense: Old Benoni | 1. d4 c5

A43 and some other Old Benonis are violations of rule 2 because of A56. Simple solution: Rename them to their own
family: Old Benoni Defense. They aren't children by moves, so why should they be children by name?

  1. Undoubtedly there will be some vagueries. I haven't dived in too deep on this (there are 170 mismatches for rule2 by my count for just var1s), This has to be looked at slowly and case by case but from correcting previous openings, I think simple solutions like this exist for most openings.

Rule 3 and Latvian Gambit: Mason Countergambit:

  1. The opening would be checked for rule 2. Since the opening family Latvian Gambit is till ...f5, var1 Mason Countergambit with 3. d4 is a direct continuation. It would pass Rule 2.
  2. I had included rule 3 because Gambits being accepted or denied are treated as their own Families/variations since ...Gambit, ...Gambit Accepted and ...Gambit Denied are not separated by : or , so they would not be caught by rule 2. See https://lichess.org/opening/tree for better visualisation (Benko, King's gambit, etc.)
  3. Rule 3 does not apply here. When would rule 3 apply? When name matches regex 'Gambit\s(Accepted|Denied)$'
  4. Countergambits should ostensibly also be gambit continuations under "gambit declined" but they are named and separated by : or , so there's no need to include them in rule 3 as they will be caught by rule 2.