Putting the *-release.cg3 mechanism into use

Question

Putting the *-release.cg3 mechanism into use

Trondtr opened this issue 2 years ago · 8 comments

Today the grammarchecker file comes in two shapes, grammarchecker.cg3 and grammarchecker-released.cg3. The setup is not in use: For sme, we work on the latter file, for the other languages we work on the former. What is needed is the following:

We all work on the file grammarchecker.cg3.
We mark all ADD rules, e.g. with an initial x on the rules not fit for release
In the make routine we add a procedure to comment out all ADD rules marked with x, thereby generating a grammarchecker-released.cg3 file strictly not for editing, containing only the rules marked for publication

The issue is getting actualised by the upcoming NoDaLiDa conference: In order to write sensible articles we need to focus on a subset of the rules.

For smn, fao, nob the rules are already marked wit x, testing the release procedure thus requires removing the x for one of the rules for testing.

Answer 1 · 2022-11-29T09:26:56.000Z

Not being a programmer, I wrote a command line that did the trick: It removed all comments, put the content on one line per OPERATOR + rule content ; and commented out all x-marked rules. The fuss is due to the files writing the rules over several lines. What now is missing is (cleaning up this and) adding grammarchecker-release to the makefile setup.

 cat grammarchecker.cg3 |\
 tr '\t' ' '|\
 sed 's/\\;/semicolon/g;'|\
 sed 's/^#/∫/'|\
 sed 's/ #/ ∫/g;'|\
 sed 's/;/;∆∫/g;'|\
 cut -d"∫" -f1|\
 tr '\n' ' '|\
 sed 's/\(SECTION[^ ]*\) /\1∆/g;'|\
 tr '∆' '\n'|\
 sed 's/^ *//g'|\
 sed 's/ADD:x/#ADD:x/' |\
 uniq |\
 sed 's/semicolon/\\;/g;' > grammarchecker-release.cg3

Answer 2 · 2023-03-07T12:04:28.000Z

@flammie could you have a look at this? See also the following commit, especially the commit message:

giellalt/lang-smn@06651d3

(or the following set of commits: giellalt/lang-smn@4f19518...16d9d8e)

Not sure which of the two approaches are most user friendly when editing a CG file - whatever you choose to do, the overall goal is simplicity for the CG/grammar checker developer.

Programmatically, the goal is to automatically create a derived grammar checker file used for production, as a copy of the development version, but with unfinished rules either commented out or removed. The dev rules should be marked somehow to make the conversion automatic.

@lynnda-hill sending this to @flammie 😄

Answer 3 · 2023-03-08T15:48:19.000Z

I wrote a gawk script that handles few different cases of ADD:x rules more. We planned slightly more elegant solution on IRC with potential future CG tooling or otherwise using CG's parser (e.g. vislcg3 --dump-ast),

Answer 4 · 2023-03-08T16:29:04.000Z

Nice. Using the CG tooling somehow seems like a good idea - then the CG file parsing is already in place. Keep in mind that the derived grammar checker / CG file still needs to be debuggable and traceable, preferably with either the generated source file as a reference, or the original source file (whatever is easiest) - also the production version needs to be tested and debugged if needed 🙂

Answer 5 · 2023-03-09T14:59:35.000Z

both ways should reserve line numbers and identifiers for tracing luckily.

Answer 6 · 2023-03-10T10:44:13.000Z

I moved the script from SMN to Giella-core.

Answer 7 · 2023-09-26T05:55:11.000Z

This is now implemented for all Sámi languages with a grammar checker. Closing.

Answer 8 · 2023-09-26T11:51:44.000Z

Note that also nob and fao have a grammar checker.