wordlist for automatically suggestion words
Closed this issue · 0 comments
wbwseeker commented
https://www.webcorpora.org//opendata/frequencies/german/decow16b/
These are frequency lists of words computed from crawled web sites. They are sorted by frequency in the texts with the most frequent on top. So if you go from top you have the best chance to get real words without typos. The list contains all types of words but you can find out nouns by taking the ones starting with capital letters.
The frequency lists are CC-BY, you just have to clean them up a bit.
I would
- throw away all words that occur less than 10000 times (or more to lower the risk even further) to get rid of the typos
- take all words starting with a capital letter to get nouns only