simple counter of prefixes, suffixes, and infixes in a text or xml file, with some allomorphy
For plain text:
python3 affcount3.py filename sequence of affixes
For text in xml files:
python3 affcount3-xml.py filename sequence of affixes
where filename
is the name of the text file, and
sequence of affixes
is space-separated prefix, suffix or infix.
Prefixes end with dash, suffixes start with it, and infixes start and end with it, for example re-, -ness, -bloody-
. We remove the dash before we search the forms.
affcount2.py
is for Python 2, and affcount3.py
for Python 3, whatever that means in a messy PL.
I assume here that python3
calls python 3, and plain python
calls python 2. Change them according to your system.
Examples:
python affcount2.py README.md -xes -er file- -fix-
python3 affcount3.py README.md -xes -er file- -fix-
python3 affcount3-xml.py fn.xml -xes -er file- -fix-
python affcount2.py fn -ler -lar
in a Turkish text file fn
, will print allomorphic count of the plural.
Caveat: These aren't really allomorphic counts because we don't do anything with semantics. We just look at the position of the "affixal form" in a word. For example, the word `kiler' (cellar) in Turkish is not plural, although -ler looks like a "suffix" and counts as such by the program.