/hfst-ospell-cg

simple cg output from hfst-ospell, expects one word per line as input

Primary LanguageC++GNU General Public License v3.0GPL-3.0

hfst-ospell-cg

https://travis-ci.org/unhammer/hfst-ospell-cg.svg

Description

This program spells using hfst-ospell, expecting one word per line, outputting Constraint Grammar format readings.

Prerequisites

  • gcc >=5.0.0 with libstdc++-5-dev (or similarly recent version of clang, with full C++11 support)
  • hfst-ospell-dev >=0.4.3

Tested with gcc-5.4.0. On Mac OS X, the newest XCode includes a modern C++ compiler.

Building

./autogen.sh
./configure  # optionally with argument --prefix=$HOME/my/prefix
make
make install # with sudo if you didn't specify a prefix

On OS X, you may have to do this:

export CC=clang CXX=clang++ "CXXFLAGS=-std=gnu++11 -stdlib=libc++"
./autogen.sh
./configure  LDFLAGS=-L/opt/local/lib # optionally with argument --prefix=$HOME/my/prefix
make
make install # with sudo if you didn't specify a prefix

Usage

Takes one arguments: a hfst-ospell zhfst archive

src/hfst-ospell-cg se.zhfst < input > output

You give a max weight (inclusive, based on error model) with -w, or max suggestions with -n, e.g.

src/hfst-ospell-cg -w 30000 se.zhfst < input > output
src/hfst-ospell-cg -n 3000 se.zhfst < input > output

You can also give a max analysis weight with -W, since the analyser may have its own weights (e.g. compound tags have higher weights):

src/hfst-ospell-cg -w 10000 se.zhfst < input > output

Note that FST weights are multiplied by 1000 and cast to integers, since CG expects numerical tags to be integral. Comparisons are inclusive (greater-than-or-equals).

Troubleshooting

If you get

terminate called after throwing an instance of 'std::regex_error'
  what():  regex_error

or

util.hpp:36:19: fatal error: codecvt: No such file or directory
 #include <codecvt>
                   ^
compilation terminated.

then your C++ compiler is too old. See Prerequisites.

Progress [2/4]

This should:

  • [X] load a zhfst bin
  • [X] output CG analyses
  • [ ] do NUL-flushing, outputting STREAMCMD:FLUSH
  • [ ] deal with subreadings