/ib4erule

Doing some word annalysis concerning the 'I before E except after C' rule

Primary LanguageShell

summary of words files:

.words was generated using:
grep -iP -e '(ie|ei)' /usr/share/dict/words > .words

.wordsNoDupe was then generated using .words
grep -ivP -e '(ier|iest|ed|ing|s|tion)$' .words > .wordsNoDupe

.wordsNoCaps was then generated:
grep -vP -e '^[A-Z]' .wordsNoDupe > .wordsNoCaps

.wordsNoHyphen:
grep -vP -e '-' .wordsNoCaps > .wordsNoHyphen

Known issues:
some deeper analysis needs to happen.
I realise that with the dupe ommissions there are some legitimate words being removed that are not actually duplicates of anything.
I'm not sure how big that number is though.

there is a large chunk of words that are legitimate followers of i before e excluded by omitting 'ier' and 'iest'.