Three-way merge for aspell personal dictionaries

This is a very simple, non-interactive merge program for aspell personal dictionaries (.pws files). It is meant for people who sync their personal aspell dictionaries over multiple computers using a vcs (eg, git) or some cloud service. Regular merge programs are easily confused by aspell files, who are just an unordered list with an header on the first line (they get especially confused by that header, since it contains the number of words in the file, so it changes when adding/removing words).

aspell-merge3 automagically merges those files, taking deletions into account, and works well with git (see the section on git integration, below)

Building and installation

Compilation requires only cabal (usually packaged as cabal-install or haskell-cabal) and is as simple as:

$ git clone https://github.com/thblt/aspell-merge3
$ cd aspell-merge3
$ cabal build

Usage

Just as simple. Pass three paths, starting with the common ancestor, and aspell-merge3 will produce a new dictionary file, either to sdtout (the default) or to a location passed to the --output, -o arg.

$ aspell-merge3 original modifiedA modifiedB --output combined

The program will terminate without output and with a non-zero exit code if headers in the three files differ in file format version, locale or encoding.

(If you need to combine two files without a common ancestor, simply pass one of these files as the ancestor, eg aspell-merge3 A A B --output O)

git integration

In your global or user git config (often ~/.config/git/config), add the following lines:

[merge "aspell-merge3"]
  name = A merge driver for aspell personal dictionaries.
  driver = aspell-merge3 %O %A %B --output %A

then, in your repo’s .gitattributes (creating it if needed)

*.pws merge=aspell-merge3

Algorithm

Nothing fancy here. Given three sets of words, one from the ancestor (O), and two from variants we need to merge (A and B), words that match at least one of these conditions are added to the output:

The word present in both A and B.
The word is present in either A or B, and not in O.

Rule 1 is the obvious case where there’s no conflict: either the word was added on both sides, or it was already in O — we don’t need to care. Rule 2 is for words that have been added to one of the sides. The extra condition in that rule, that the word is absent from O, is to propagate deletions: a word present in O, but not in both A and B, has been deleted on one side and that deletion must be propagated to the final result.

In other words, the result of the merge is:

  (A∩B)  ∪ (AΔB)\O
  -----    -------
  Rule 1   Rule 2

(Δ is symmetric difference. AΔB == A∪B\A∩B)