This repository includes code to define grammars of Samoan stress over monomorphs in xfst, and to compare their output with the correct "gold standard" output. The most recent write-up of this work is the PDF file in the top-level directory.
To run this code you need to install xfst
, which is available here:
The repository contains two sub-directories:
xfst-code
: this contains shell scripts and xfst code to define the grammars and compare them to the gold standard. More details are in the section xfst code, including annotated output from a sample run.otsoft-files
: this contains files from runs with OTSoft that show how partial rankings were computed. More details are in the section OTSoft files.
The xfst-code
directory contains a Makefile
and five
sub-directories:
auxiliary
: this contains auxiliary files that are called byxfst
in the definition of the grammars, an input file to test grammars for overgeneration, and "gold standard" output files that contain the correct set of outputs that should be generated by the grammars.dir-ft
: this containsxfst
code to define the direct-foot grammar, two text fileschk-overgen.txt
andchk-undergen.txt
generated by thexfst
code to check for over- and under-generation, and aMakefile
that is called by theMakefile
in the top-level directory (xfst-code/
)dir-syll
: likedir-ft
, butxfst
code defines direct-syllable grammar.ot-ft
: likedir-ft
, butxfst
code defines OT-foot grammar.ot-syll
: likedir-ft
, butxfst
code defines OT-syllable grammar.
In the xfst-code
directory, to run the direct foot code, type:
make dir-ft
To run the direct syllable code, type:
make dir-syll
To run the OT foot code, type:
make ot-ft
To run the OT syllable code, type:
make ot-syll
Here's the output you should get from running the direct foot code, annotated, after invoking make dir-ft
in the xfst-code
sub-directory. I have broken it up into chunks to make it easier to describe.
/Applications/Xcode.app/Contents/Developer/usr/bin/make -C dir-ft all
+++ Sort gold standard correct list of 47 outputs +++
sort "../auxiliary/gold-output5-footed.txt" > tmp-gold
+++ Define grammar in xfst and generate outputs for testing over/undergeneration +++
xfst -f grammar.xfst
This first output chunk shows the creation of a temporary tmp-gold
file which is a text file of the correctly footed/stressed outputs for the range of data being considered, and shows the call to xfst
to run the code in grammar.xfst
.
<< Generating language of input strings >>
Opening input file '../auxiliary/gen.xfst'
Defined 'Input': 656 bytes. 1 state, 2 arcs, Circular.
Defined 'SWParse': 2.5 Kb. 4 states, 9 arcs, Circular.
Defined 'ElevateProm': 704 bytes. 1 state, 4 arcs, Circular.
Defined 'Gen': 888 bytes. 4 states, 7 arcs, Circular.
Closing file ../auxiliary/gen.xfst...
This second output chunk above shows the first output from running grammar.xfst
in xfst
, which runs ../auxiliary/gen.xfst
to define the finite state transducer for Gen.
<< Parsing into (binary) feet >>
Defined 'Heavy': 776 bytes. 4 states, 3 arcs, 1 path.
Defined 'Light': 776 bytes. 4 states, 3 arcs, 1 path.
Defined 'ParseFoot': 5.0 Kb. 18 states, 93 arcs, Circular.
This third output chunk above shows more output from running grammar.xfst
in xfst
. Here we've defined transducers for auxiliary terms Heavy
(heavy syllables) and Light
(light syllables), which we refer to in the definition of ParseFoot
, the transduction that parses the input from Gen
into feet.
<< Define restrictions on feet >>
Defined 'Foot': 488 bytes. 3 states, 3 arcs, Circular.
Defined 'PrimaryFoot': 848 bytes. 4 states, 6 arcs, Circular.
Defined 'WeakLight': 832 bytes. 5 states, 4 arcs, 1 path.
Defined 'LLFoot': 1.3 Kb. 11 states, 15 arcs, 6 paths.
Defined 'Trochee': 1.4 Kb. 11 states, 17 arcs, 14 paths.
This fourth output chunk above shows more output from running grammar.xfst
in xfst
. Here we've defined transducers that place restrictions on feet.
<< Define restrictions on words in terms of feet >>
Defined 'PrimaryFootRight': 944 bytes. 4 states, 10 arcs, Circular.
Defined 'TrocheesOnly': 1.5 Kb. 13 states, 21 arcs, Circular.
Defined 'InitialDactyl': 3.6 Kb. 16 states, 105 arcs, Circular.
Defined 'ReplaceUnparsedX': 3.4 Kb. 9 states, 43 arcs, Circular.
Defined 'LSmo': 2.5 Kb. 31 states, 36 arcs, Circular.
This fifth output chunk above shows some more output from running grammar.xfst
in xfst
. Here we've defined transducers that place restrictions on stress patterns in words in terms of feet, and we've defined the final transduction LSmo
: that's the whole grammar.
Now we're ready for testing the expressiveness of the defined grammar LSmo
. In the output chunk below, it can be seen that at this point, grammar.xfst
has called ../auxiliary/test-overgen.xfst
, which is code written to prepare to test for overgeneration of the grammar LSmo
. It compute outputs of all possible inputs up to 5 syllables by composing all of the possible inputs up to 5 syllables (Inputs5
) with LSmo
, and these outputs are written to a textfile chk-overgen.txt
.
<< Testing expressiveness of grammar >>
Opening input file '../auxiliary/test-overgen.xfst'
Opening input file '../auxiliary/input5.txt'
Reading UTF-8 text from '../auxiliary/input5.txt'
1.0 Kb. 6 states, 10 arcs, 62 paths.
Defined 'Inputs5': 1.0 Kb. 6 states, 10 arcs, 62 paths.
Defined 'GenInputs5': 4.7 Kb. 67 states, 79 arcs, 47 paths.
4.7 Kb. 67 states, 79 arcs, 47 paths.
Opening 'chk-overgen.txt'
Closing 'chk-overgen.txt'
Closing file ../auxiliary/test-overgen.xfst...
In the next output chunk, grammar.xfst
has called ../auxiliary/test-undergen.xfst
, which is code written to prepare to test for undergeneration of the grammar LSmo
. It composes the final transducer LSmo
with a "gold standard" set of permitted outputs for up to 5 syllables (defined by the author), Outputs5
and writes the output to a textfile chk-undergen.txt
Opening input file '../auxiliary/gold-output5-footed.txt'
Reading UTF-8 text from '../auxiliary/gold-output5-footed.txt'
4.7 Kb. 67 states, 79 arcs, 47 paths.
Opening input file '../auxiliary/test-undergen.xfst'
Defined 'Outputs5': 4.7 Kb. 67 states, 79 arcs, 47 paths.
Defined 'ChkLSmo': 4.7 Kb. 67 states, 79 arcs, 47 paths.
4.7 Kb. 67 states, 79 arcs, 47 paths.
Opening 'chk-undergen.txt'
Closing 'chk-undergen.txt'
Closing file ../auxiliary/test-undergen.xfst...
bye.
With bye
, now we've quit xfst
and are ready to the final checks for overgeneration and undergeneration. The first chunk below looks for any differences between the tmp-gold
file (a text file of the correctly footed/stressed outputs for the range of data being considered) and the output from LSmo
for this same range of data, all light-heavy sequences up to 5 syllables. We see no output telling us that theFiles differ
, so there are no differences: LSmo
didn't derive strings in the language other than the correct ones. The bash command wc -l
also counts the number of lines in the output text file, there are 47, as expected.
+++ For generated output of grammar for L* H* inputs up to 5 syllables +++
+++ check against correct set of outputs for inputs up to 5 syll +++
+++ Check if any overgeneration compared to gold standard +++
sort < chk-overgen.txt | sed '1d' | diff tmp-gold - || echo 'Files differ'
+++ Check number of output strings from chk-overgen.txt +++
sed '1d' chk-overgen.txt | wc -l
47
The final output chunk below looks for any differences between the tmp-gold
file (a text file of the correctly footed/stressed outputs for the range of data being considered) and the output from LSmo
in the set of correct set of outputs defined by the author. We see no output telling us that theFiles differ
, so there are no differences: LSmo
didn't miss deriving any strings in the language. The bash command wc -l
also counts the number of lines in the output text file, there are 47, as expected.
+++ From intersecting output of grammar with correct set of outputs for inputs up to 5 syll +++
+++ Check against correct set of outputs for inputs up to 5 syll +++
sort < chk-undergen.txt | sed '1d' | diff tmp-gold - || echo 'Files differ'
+++ Check number of output strings from chk-undergen.txt +++
sed '1d' chk-undergen.txt | wc -l
47
rm -f tmp-gold
Constraint rankings were computed using OTSoft, which is a Windows program available for download here. The citation for this software is:
Hayes, Bruce, Bruce Tesar, and Kie Zuraw (2013) "OTSoft 2.5," software
package, http://www.linguistics.ucla.edu/people/hayes/otsoft/.
The otsoft-files
directory contains two sub-directories:
ft
: this contains files computing the partial ranking of the OT constraint set referring to feet given in the papersyll
: this contains files computing the partial ranking of OT constraint sets referring only to syllablesactual
: contains files from the OT constraint set given in the papertest-with-rlt
: contains files from a larger OT constraint set including constraints in Kager (2001) and (2005) rhythmic licensing theory, to show that even with all those constraints in the constraint set, the gradient Align constraints are still necessary.