/latinate

I'm trying to get a list of English verbs with latinate prefixes

Primary LanguageR

Latinate Prefixes

I'm trying to get data on latiante prefixes on English verbs.

The Data

Lexion

I81379.csv is data from the English Lexicon Project with the filter that the word should have between 1 and 3 morphemes. I wanted to exclude inflected and derived forms as much as possible.

Prefixes

prefixes.txt is Latin and Greek prefixes scraped from Wikipedia.

Current Status

At some point, there is going to be no way around coding something or another by hand, but I'd like to reduce the number of decisions I need to make as much as possible, and to make the process as reproducible as possible.

Right now, there are a lot of prefixes in the data scraped from wikipedia that are irrelevant, like vulp. It would be nice to pare these down in some principled way.

Decisions to make

I need to decide how I want to handle "prefixes" from wikipedia which aren't cognate with latin prepositions. e.g., melior- in meliorate.