/top-english-wordlists

Lists of most-frequently-used english words / nouns / verbs etc.

Primary LanguagePython

Lists of the most frequently used (top) English words

Data source: Google ngrams English data set version 20120701, years 1950 to 2012.

This compilation is licensed under a Creative Commons Attribution 3.0 Unported License.

These collections can save you time and data, if you don't want to download and process the google ngram data yourself. The scripts to generate the wordlists are also available. The wordlists may need additional filtering depending on your planned use (e.g. in the dataset there are some non-english characters and words, letters followed by numbers, symbols, www addresses etc.).

If you are looking for more sanitised data, for a spell checker for example, or want forms/variations of a word, check out SCOWL and friends.

The words are sorted so the most frequently used words appear first.

Top 1000000 English words, all lowercase

top_english_words_lower_1000000.txt
top_english_words_lower_500000.txt
top_english_words_lower_100000.txt
top_english_words_lower_50000.txt
top_english_words_lower_20000.txt
top_english_words_lower_10000.txt

Top 1000000 English words, mixed case

top_english_words_mixed_1000000.txt
top_english_words_mixed_500000.txt
top_english_words_mixed_100000.txt
top_english_words_mixed_50000.txt
top_english_words_mixed_20000.txt
top_english_words_mixed_10000.txt

Top 500000 English nouns, all lowercase

top_english_nouns_lower_500000.txt
top_english_nouns_lower_100000.txt
top_english_nouns_lower_50000.txt
top_english_nouns_lower_20000.txt
top_english_nouns_lower_10000.txt

Top 500000 English nouns, mixed case

top_english_nouns_mixed_500000.txt
top_english_nouns_mixed_100000.txt
top_english_nouns_mixed_50000.txt
top_english_nouns_mixed_20000.txt
top_english_nouns_mixed_10000.txt

Top 100000 English verbs, all lowercase

top_english_verbs_lower_100000.txt
top_english_verbs_lower_50000.txt
top_english_verbs_lower_20000.txt
top_english_verbs_lower_10000.txt

Top 100000 English verbs, mixed case

top_english_verbs_mixed_100000.txt
top_english_verbs_mixed_50000.txt
top_english_verbs_mixed_20000.txt
top_english_verbs_mixed_10000.txt

Top 100000 English adjectives, all lowercase

top_english_adjs_lower_100000.txt
top_english_adjs_lower_50000.txt
top_english_adjs_lower_20000.txt
top_english_adjs_lower_10000.txt

Top 100000 English adjectives, mixed case

top_english_adjs_mixed_100000.txt
top_english_adjs_mixed_50000.txt
top_english_adjs_mixed_20000.txt
top_english_adjs_mixed_10000.txt

Top 10000 English adverbs, all lowercase

top_english_advs_lower_10000.txt

Top 10000 English adverbs, mixed case

top_english_advs_mixed_10000.txt

Top 10000 English pronouns, all lowercase

top_english_prons_lower_10000.txt

Top 10000 English pronouns, mixed case

top_english_prons_mixed_10000.txt

Top 500 English numeric words, all lowercase

top_english_nums_lower_500.txt

Top 500 English conjunction words, all lowercase

top_english_conjs_lower_500.txt

Top 500 English determiner/article words, all lowercase

top_english_dets_lower_500.txt

Top 500 English particle words, all lowercase

top_english_prts_lower_500.txt

Top 500 English adposition (preposition/postposition) words, all lowercase

top_english_adps_lower_500.txt