DistributedProofreaders/guiguts-py

PPtxt Hyphenation check is over-sensitive

Closed this issue · 1 comments

Via Slack:

The hyphenation checking (hyphenated/non-hyphenated check) in the PPtxt tool ("Tools" menu) seems to be overly sensitive, resulting in many false positives. The PP Workbook using the pptxt tool identified only one potential hyphenation error, whereas the GG2 tool flagged dozens.

history_i_1-utf8.zip

This has been fixed but not yet applied to the master version of GG2.

The hyphenated/non-hyphenated words check in PPtxt is supposed only to check for word differences such as "motor-car" and "motorcar". The flagging by the PPWB PPtxt version of

Col (6) ❬-❭ -Col (1)
  2771: retreat by order of Lieut.-Col. Campbell, who was next in rank, and
        -----
  1386: The other sons of Sir William were “Col. Francis” (of “Lucasta”), Thomas
  1387: and Dudley. The Dictionary of National Biography only knows of Col.
  1801: Nov. 14, 1744, d. s. p. June 26, 1834; third wife of Col. John Bayard
  1834: The fifth, Joanna, baptized Jan. 31, 1694, married in 1716, Col. Anthony

is a bug. This is also incorrectly flagged in the GG2 version along with many similar situations. That bug has been fixed.

The contents of history_i_1-utf8.zip should no longer have any words flagged by the hyphenated/non-hyphenated words check in the GG2 version of PPtxt.

The bug fix may have other benign consequences in GG2 PPtxt output since it reduces the number of 'words' in the dictionary of words and their counts found in a book. This dictionary is used by many checks in PPtxt. The 'words' that are no longer in the dictionary are the parts of hyphenated words that were deliberately included in the dictionary by the GG2 PPtxt but shouldn't have been.

The splitting of "Lieut.-Col." into "Lieut." and "-Col" by the PPWB PPtxt is inadvertent and caused by a separate bug.