This repo contains a script (and the resulting data) to compile a dataset of British and American English. It needs some external corpus which is then scanned for keywords predominantly found in British or American English. The corpus used for the resulting data in this repo are reddit comments.
Just run ./gen_data.sh path-to-corpus