/GiellaLTLexTools

Scripts for testing lexicony stuff in giellalt plus some processing lexc file sripts

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

GiellaLTLexTools

Scripts for testing lexicony stuff in giellalt plus some processing lexc python scripts.

Dependencies

Uses pyhfst to load HFST automata. Run poetry install to install dependencies. Spell-checker testing uses divvunspell binaries. You can install divvunspell with cargo.

Installation

You can install giellaltlextools with pipx: pipx install git+https://github.com/divvun/giellaltlextools.

Technical Details

This project uses Poetry's build system to ensure optimal pyhfst installation. The project is configured to automatically optimize pyhfst installation with Cython for better performance:

  • Build System: Declares Cython as a build-time requirement
  • Build Script: scripts/build.py automatically handles pyhfst optimization
  • Dependencies: Cython is included as both a runtime and build dependency

The build script runs automatically during poetry install and poetry build, ensuring pyhfst is always installed with Cython support when available.

Usage

Mainly from make check in GiellaLT infra.

There are currently three programs installed:

  • gtlemmatest for testing that a generator generates lemmas found from a lexc file
  • gtparadigmteset for testing that a generator generates full paradigms of the lemmas
  • gtspelltest for testing that a spell checker accepts lemmas from lexc files.

Lemma testing

$ gtlemmatest -l src/fst/morphology/stems/nouns.lexc \
    -a src/fst/analyser-gt-desc.hfstol \
    -g src/fst/generator-gt-desc.hfstol \
    -t +N+Sg+Nom -t +N+Pl+Nom

The lexc files should mainly contain lexc lines that contain full lemma forms.

Paradigm testing

$ gtparadigmtest -l src/fst/morphology/stens/nouns.lexc \
    -p src/fst/morphology/test/testnounparadigm.txt \
    -g src/fst/generator-gt-desc.hfstol

The lexc files should mainly contain lexc lines that contain full lemma forms.

Spell-checker lemma testing

$ gtspelltest -z tools/spellcheckers/se.zhfst -D divvunspell \
    src/fst/morphology/stems/*.lexc

The lexc files should mainly contain lexc lines that contain full lemma forms.