/ccpc

Primary LanguageProlog

Prerequisites: OCaml, ocamlgraph, Mathematica, pdflatex, swipl.

===========================================
==== Working with the bundled grammars ====
===========================================

To try things out with the grammars that are included in the 
repository, all you should need to do is the following:

(1) Run 'make' in the root directory of your working copy. This should 
    produce executables called parse, intersect, and visualize.

(2) Set the first line of renormalize.m to point to your installation 
    of Mathematica. (Unless MATHSCRIPT will be set appropriately in renormalize.csh.)

You should now be able to use the parser itself in isolation do some 
simple things like:

  - Parse and display derivation trees to stdout:
      ./parse -g grammars/wmcfg/strauss.wmcfg -i "Jon hit the dog with the stick"

  - Generate an intersection grammar (without renormalizing) from an existing 
    grammar and a prefix:
      ./intersect -g grammars/wmcfg/strauss.wmcfg -prefix "Jon hit"

To view the ``predictions'' of a normalized intersection grammar, use the 
transitions.sh script, which requires four command-line arguments: a choice of 
``mode'', which is either `-kbest' or `-sample'; the 
(pre-intersection) grammar file inside the grammars/wmcfg directory; the 
initial fragment of the sentence you're interested in; the number of 
derivations to show; and some ``tag'' to identify this run of the script, which 
will appear as part of the names of all the files that are generated. For example:
      ./transitions.sh -kbest grammars/wmcfg/larsonian1.wmcfg "the story" 30 MYTAG
This will generate three PDF files, one for each of three positions in the 
sentence up to this point (i.e. one for each of the three prefixes "", "the" 
and "the story"), each showing the 30 best parses that are possible from that 
position.

=================================================
==== Adding or modifying Minimalist Grammars ====
=================================================

Before adding to or modifying the existing Minimalist Grammars, you will need 
to do the following:

(1) Get Mathieu Guillaumin's MG-to-MCFG translator from
      https://bitbucket.org/mguillau/mg2mcfg

    Download it as a ZIP archive or use the git source code control system
       git clone https://bitbucket.org/mguillau/mg2mcfg.git guillaumin

    Place this directory so that "guillaumin" and your working copy are 
    sister directories. For instance if your download was called 0a470e1368b9.zip 
        unzip 0a470e1368b9.zip '*hmg2mcfg/*' -d guillaumin

    then extract the relevant directory for the single program we require:
        mv mguillau-mg2mcfg-0a470e1368b9/hmg2mcfg hmg2mcfg
        rmdir mguillau-mg2mcfg-0a470e1368b9

(2) In the guillaumin/hmg2mcfg directory, run 'make clean' and 'make' to 
    compile the translator. This should produce an executable called hmg2mcfg.

Now, to set up a new additional Minimalist Grammar for the parser to work 
with, you need to provide one file yourself and then use make to generate 
three other derived files. Let's call the root of the working copy $ROOT, 
and let's suppose that $NAME is a good name to associate with the new grammar.

The file you need to provide is:

    - The grammar in prolog format.
        $ROOT/grammars/mg/$NAME.pl

The three derived files that the parser will need can be generated as follows:

    - The MCFG equivalent to the given MG:
        make grammars/mcfgs/$NAME.mcfg

    - The ``dictionary file'' that specifies how to map derivations in 
      this MCFG back to derivations in the original MG:
        make grammars/mcfgs/$NAME.dict

    - The weighted version of the MCFG:
        make grammars/wmcfg/$NAME.wmcfg

==== About Assigning Weights ====

The third make command above will, by "default", assign weights to the MCFG 
rules in a way that distributes the weight for each nonterminal uniformly 
across all the rules that expand it. This is a handy quick-and-dirty method 
for generating a WMCFG but is probably not actually what you want (note that 
the resulting grammar is often not consistent).

The alternative is to use a training corpus to provide more sensible weights. 
A training corpus is simply a plain text file where each line consists of a 
sentence generated by the grammar, preceded by an integer specifying the 
sentence's frequency/weight. This should be saved as:
        $ROOT/grammars/train/$NAME.train

The Makefile looks for a file in this location when you try to make the 
wmcfg file (the third make command above). If it finds such a file, it uses 
it as the training corpus; otherwise, it uses the default quick-and-dirty 
method.

If you have some other way of adding weights to the MCFG, that's fine: you 
can ignore the last of the three make commands above, and instead just 
place your $NAME.wmcfg file in grammars/wmcfg directly. Our training tool 
does not do anything remotely fancy. You may well have something more 
advanced.

=============================================================
==== Adding (M)CFGs not derived from Minimalist Grammars ====
=============================================================

Since the core of the system (the functionality encapsulated in the parse and 
intersect executables) actually 
works with MCFGs, you can also provide MCFGs of your own that are not 
derived from Minimalist Grammars. The grammar used in these examples from 
above, for example, is actually a (W)MCFG that was not derived from any MG.

      ./parse -g grammars/wmcfg/strauss.wmcfg -i "Jon hit the dog with the stick"
      ./intersect -g grammars/wmcfg/strauss.wmcfg -prefix "Jon hit"

Notice that there are no associated files grammars/{mcfg,mg}/strauss.* of 
the sort that connect strauss.wmcfg back to a Minimalist Grammar. This does 
not cause a problem for anything you want to do using parse or intersect, which only 
use the wmcfg file you provide on the command line itself.

But transitions.sh (and the underlying executable visualize, which does the 
heavy lifting there) assumes that the MCFG it is given is the translation 
of an MG, and uses the associated files in grammars/{mcfg,mg} to 
``un-translate'' the likely derivations back into MG derivations; notice 
that the trees in the PDFs produced by transitions.sh shows MG derived trees 
(or X-bar trees).

==============================
==== For more information ====
==============================

See the CCPC user documentation, available online here:
    https://courses.cit.cornell.edu/jth99/ccpc-doc.pdf