/apertium-tsx-lint

lint for Apertium tagger specification files

Primary LanguagePerl

NAME

apertium-tsx-lint - Test a TSX file for common problems

SYNOPSIS

apertium-tsx-lint tsx-file [DIC]

DESCRIPTION

Test a TSX file for for some common pitfalls.

TSX is one of the paths less well travelled, and tends to be a bit black magic, "oh my God, that's the funky shit"-y.

Currently, there are three checks:

WARN_CONFLICT

Warns of a conflict between tagger items (i.e., if they are exactly the same).

NO_OPEN_TAGS

Without tags for open classes, the tagger will fail to train, because it will be unable to handle unknown words.

MASKED_AMBIGUITY

Warns if the same tagger item matches two (or more) analyses.

FILES

tsx-file

The name of the tsx file to check

[DIC]

Optional dictionary file. This should be the same as the DIC file used by the tagger; i.e., an expansion of the analyses produced by the analyser.

TODO

  • Testing. Real testing, on real data, and not my silly examples of the sort of thing that might happen.

  • Documentation. 'Nuff said.

  • sort|uniq the analyses, to cut down on spurious warnings.

COPYRIGHT

Copyright 2013 Jimmy O'Regan

This program is free software; you can use, redistribute and/or modify it under the terms of either:

  • the GNU General Public License as published by the Free Software Foundation; version 2, or

  • the Artistic License version 2.0.