Quick links
Summary
An estimate of the relative frequencies of English phonemes. Also, an estimate of the relative frequencies of English phonemes that follow /w/.
Methodology
Reproducing the work of Doug Blumeyer, I correlated the CMU Pronouncing Dictionary ("CMUdict") and Adam Kilgarriff's unlemmatized frequency list for the British National Corpus to find phoneme frequencies generally. I extended this technique to estimate post-/w/ phoneme frequencies as well.
Limitations
As Blumeyer notes, the source datasets have some limitations. CMUdict conflates "schwa with the near-open central vowel" and has "several noticeable errors." Kilgarriff's frequency list has some formatting issues that make it hard to work with words with accents and apostrophes, (at this time, I've completely ignored this issue) including common contractions.
Blumeyer did manual error checking on several hundred of the most common words. I have not done this.
The CMUdict has multiple pronunciations for some words. For these words, I used only the first pronunciation given. It's not clear to me if in these cases the multiple pronunciations are ordered in some way or just ordered arbitrarily.
Other notes
While the Kilgarriff list is for the British National Corpus, a quick inspection suggests that it uses American pronunciations over British ones.
References
- Doug Blumeyer, "Relative Frequencies of English Phonemes"
- CMU Pronouncing Dictionary (Local copy at version 0.7b. Retrieved May 28, 2018.)
- Adam Kilgarriff, word frequencies for the BNC (Local copy retrieved May 28, 2018.)