Putting the *-release.cg3 mechanism into use
Trondtr opened this issue · 8 comments
Today the grammarchecker file comes in two shapes, grammarchecker.cg3
and grammarchecker-released.cg3
. The setup is not in use: For sme, we work on the latter file, for the other languages we work on the former. What is needed is the following:
- We all work on the file grammarchecker.cg3.
- We mark all
ADD
rules, e.g. with an initialx
on the rules not fit for release - In the make routine we add a procedure to comment out all ADD rules marked with
x
, thereby generating agrammarchecker-released.cg3
file strictly not for editing, containing only the rules marked for publication
The issue is getting actualised by the upcoming NoDaLiDa conference: In order to write sensible articles we need to focus on a subset of the rules.
For smn, fao, nob the rules are already marked wit x, testing the release procedure thus requires removing the x for one of the rules for testing.
Not being a programmer, I wrote a command line that did the trick: It removed all comments, put the content on one line per OPERATOR + rule content ; and commented out all x-marked rules. The fuss is due to the files writing the rules over several lines. What now is missing is (cleaning up this and) adding grammarchecker-release to the makefile setup.
cat grammarchecker.cg3 |\
tr '\t' ' '|\
sed 's/\\;/semicolon/g;'|\
sed 's/^#/∫/'|\
sed 's/ #/ ∫/g;'|\
sed 's/;/;∆∫/g;'|\
cut -d"∫" -f1|\
tr '\n' ' '|\
sed 's/\(SECTION[^ ]*\) /\1∆/g;'|\
tr '∆' '\n'|\
sed 's/^ *//g'|\
sed 's/ADD:x/#ADD:x/' |\
uniq |\
sed 's/semicolon/\\;/g;' > grammarchecker-release.cg3
@flammie could you have a look at this? See also the following commit, especially the commit message:
(or the following set of commits: giellalt/lang-smn@4f19518...16d9d8e)
Not sure which of the two approaches are most user friendly when editing a CG file - whatever you choose to do, the overall goal is simplicity for the CG/grammar checker developer.
Programmatically, the goal is to automatically create a derived grammar checker file used for production, as a copy of the development version, but with unfinished rules either commented out or removed. The dev rules should be marked somehow to make the conversion automatic.
@lynnda-hill sending this to @flammie 😄
I wrote a gawk script that handles few different cases of ADD:x rules more. We planned slightly more elegant solution on IRC with potential future CG tooling or otherwise using CG's parser (e.g. vislcg3 --dump-ast
),
Nice. Using the CG tooling somehow seems like a good idea - then the CG file parsing is already in place. Keep in mind that the derived grammar checker / CG file still needs to be debuggable and traceable, preferably with either the generated source file as a reference, or the original source file (whatever is easiest) - also the production version needs to be tested and debugged if needed 🙂
both ways should reserve line numbers and identifiers for tracing luckily.
I moved the script from SMN to Giella-core.
This is now implemented for all Sámi languages with a grammar checker. Closing.
Note that also nob and fao have a grammar checker.