Prerequisites: OCaml, ocamlgraph, Mathematica, pdflatex, swipl. =========================================== ==== Working with the bundled grammars ==== =========================================== To try things out with the grammars that are included in the repository, all you should need to do is the following: (1) Run 'make' in the root directory of your working copy. This should produce executables called parse, intersect, and visualize. (2) Set the first line of renormalize.m to point to your installation of Mathematica. (Unless MATHSCRIPT will be set appropriately in renormalize.csh.) You should now be able to use the parser itself in isolation do some simple things like: - Parse and display derivation trees to stdout: ./parse -g grammars/wmcfg/strauss.wmcfg -i "Jon hit the dog with the stick" - Generate an intersection grammar (without renormalizing) from an existing grammar and a prefix: ./intersect -g grammars/wmcfg/strauss.wmcfg -prefix "Jon hit" To view the ``predictions'' of a normalized intersection grammar, use the transitions.sh script, which requires four command-line arguments: a choice of ``mode'', which is either `-kbest' or `-sample'; the (pre-intersection) grammar file inside the grammars/wmcfg directory; the initial fragment of the sentence you're interested in; the number of derivations to show; and some ``tag'' to identify this run of the script, which will appear as part of the names of all the files that are generated. For example: ./transitions.sh -kbest grammars/wmcfg/larsonian1.wmcfg "the story" 30 MYTAG This will generate three PDF files, one for each of three positions in the sentence up to this point (i.e. one for each of the three prefixes "", "the" and "the story"), each showing the 30 best parses that are possible from that position. ================================================= ==== Adding or modifying Minimalist Grammars ==== ================================================= Before adding to or modifying the existing Minimalist Grammars, you will need to do the following: (1) Get Mathieu Guillaumin's MG-to-MCFG translator from https://bitbucket.org/mguillau/mg2mcfg Download it as a ZIP archive or use the git source code control system git clone https://bitbucket.org/mguillau/mg2mcfg.git guillaumin Place this directory so that "guillaumin" and your working copy are sister directories. For instance if your download was called 0a470e1368b9.zip unzip 0a470e1368b9.zip '*hmg2mcfg/*' -d guillaumin then extract the relevant directory for the single program we require: mv mguillau-mg2mcfg-0a470e1368b9/hmg2mcfg hmg2mcfg rmdir mguillau-mg2mcfg-0a470e1368b9 (2) In the guillaumin/hmg2mcfg directory, run 'make clean' and 'make' to compile the translator. This should produce an executable called hmg2mcfg. Now, to set up a new additional Minimalist Grammar for the parser to work with, you need to provide one file yourself and then use make to generate three other derived files. Let's call the root of the working copy $ROOT, and let's suppose that $NAME is a good name to associate with the new grammar. The file you need to provide is: - The grammar in prolog format. $ROOT/grammars/mg/$NAME.pl The three derived files that the parser will need can be generated as follows: - The MCFG equivalent to the given MG: make grammars/mcfgs/$NAME.mcfg - The ``dictionary file'' that specifies how to map derivations in this MCFG back to derivations in the original MG: make grammars/mcfgs/$NAME.dict - The weighted version of the MCFG: make grammars/wmcfg/$NAME.wmcfg ==== About Assigning Weights ==== The third make command above will, by "default", assign weights to the MCFG rules in a way that distributes the weight for each nonterminal uniformly across all the rules that expand it. This is a handy quick-and-dirty method for generating a WMCFG but is probably not actually what you want (note that the resulting grammar is often not consistent). The alternative is to use a training corpus to provide more sensible weights. A training corpus is simply a plain text file where each line consists of a sentence generated by the grammar, preceded by an integer specifying the sentence's frequency/weight. This should be saved as: $ROOT/grammars/train/$NAME.train The Makefile looks for a file in this location when you try to make the wmcfg file (the third make command above). If it finds such a file, it uses it as the training corpus; otherwise, it uses the default quick-and-dirty method. If you have some other way of adding weights to the MCFG, that's fine: you can ignore the last of the three make commands above, and instead just place your $NAME.wmcfg file in grammars/wmcfg directly. Our training tool does not do anything remotely fancy. You may well have something more advanced. ============================================================= ==== Adding (M)CFGs not derived from Minimalist Grammars ==== ============================================================= Since the core of the system (the functionality encapsulated in the parse and intersect executables) actually works with MCFGs, you can also provide MCFGs of your own that are not derived from Minimalist Grammars. The grammar used in these examples from above, for example, is actually a (W)MCFG that was not derived from any MG. ./parse -g grammars/wmcfg/strauss.wmcfg -i "Jon hit the dog with the stick" ./intersect -g grammars/wmcfg/strauss.wmcfg -prefix "Jon hit" Notice that there are no associated files grammars/{mcfg,mg}/strauss.* of the sort that connect strauss.wmcfg back to a Minimalist Grammar. This does not cause a problem for anything you want to do using parse or intersect, which only use the wmcfg file you provide on the command line itself. But transitions.sh (and the underlying executable visualize, which does the heavy lifting there) assumes that the MCFG it is given is the translation of an MG, and uses the associated files in grammars/{mcfg,mg} to ``un-translate'' the likely derivations back into MG derivations; notice that the trees in the PDFs produced by transitions.sh shows MG derived trees (or X-bar trees). ============================== ==== For more information ==== ============================== See the CCPC user documentation, available online here: https://courses.cit.cornell.edu/jth99/ccpc-doc.pdf