Latinate Prefixes
I'm trying to get data on latiante prefixes on English verbs.
The Data
Lexion
I81379.csv
is data from the English Lexicon Project with the filter that the word should have between 1 and 3 morphemes. I wanted to exclude inflected and derived forms as much as possible.
Prefixes
prefixes.txt
is Latin and Greek prefixes scraped from Wikipedia.
Current Status
At some point, there is going to be no way around coding something or another by hand, but I'd like to reduce the number of decisions I need to make as much as possible, and to make the process as reproducible as possible.
Right now, there are a lot of prefixes in the data scraped from wikipedia that are irrelevant, like vulp
. It would be nice to pare these down in some principled way.
Decisions to make
I need to decide how I want to handle "prefixes" from wikipedia which aren't cognate with latin prepositions. e.g., melior- in meliorate.