Author: John K Pate Release date: Jan 25 2010 E-mail: j.k.pate@sms.ed.ac.uk This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>. This is the first release of the ShakesEM library for doing Expectation-Maximization for Probabilistic Context Free Grammars. The library may be compiled with simply: $ scalac ShakesEM.scala The name of the library, ShakesEM, is a reference to William Shakespeare due to the library's use of Scala Actors for distributed processing. The ``example'' directory shows a basic use of the library. It contains an example grammar file, an example lexicon, a corpus of 10 (mostly nonsense) sentences, and a directory that stores resulting grammars. The rest of the files were generated with: $ scala shakesEMExample toyGrammar.txt toyLexicon.txt testSentences.txt 2 \ 0.001 exampleOutput/exampleRun &> exampleRun.log The number following ``testSentences.txt'' in the above example corresponds to the number of parsers that are started. You can start as many parsers as you like, up to (and including) the number of sentences in your corpus. If you start fewer parsers than you have processor cores, you will use as many cores as you have parsers. If you start more parsers than you have processor cores, you will use all your cores and the parsers will share computing resources transparently. Note that both scalac and scala use the '-d' flag to decide where to place and search for, respectively, JVM bytecode. The ``scaladoc'' directory contains documentation generated by scaladoc (similar to javadoc)