Boggle is a word game designed by Allan Turoff and originally distributed by Parker Brothers. The game is played using a plastic grid of lettered dice, in which players attempt to find words in sequences of adjacent letters. -- Boggle - wikipedia
I like learning languages, I used to study Celtic philology where I studied Irish, Welsh and other minority languages. During that time I got to visit the few places that still have native speakers of these languages and experienced a bit of their interesting cultures. I learned that there are fewer and fewer speakers and that great efforts are being made to preserve their linguistic habitats. Also at that time I got introduced to Boggle for English and spent a lot of time playing with friends expanding and refreshing my vocabulary with each game. And there wasn't a boggle game for Irish. I created one and later on, expanded the code to be able to produce such boards for other languages and variants of the game (board sizes).
I imagine playing games in a minority language is a way to help preserve it. It can be fun for both native speakers and foreign learners.
I set a goal for myself which is impossible in purpose. I want to create a boggle version for every language there is, provided it is possible (some scripts lend themselves to this game better than others, like Japanese kanji or morse code aren't that great). And there are thousands of languages. I expect to find interesting algorithmic challenges on the way.
boggle.brak.me - The results of the algorithm can be seen on my website as an off-line game
The biggest problem in creating a boggle for a new language is obtaining a quality frequency-sorted wordlist for the language. So you won't really be able to enjoy this application unless you have some experience in finding such resources. Some linguistic knowledge will also be required to deal with languages that use non-latin scripts and in general to assess whether the game you created is playable.
- python3
- 100 chars per line
- Keep it as simple as possible but not simpler
- Performance is important
iso
- whenever used, refers to 3 letter language identifier ISO_639-3iso2
- 2 letter equivalent, for some external systems that use itword form
- inflected word, orthographically distinct form of a lemmalemma
- canonical, dictionary form of a word, as opposed toword forms
, also wikipediafrequency word list
- plan text file with word forms of a given language, sorted by frequency of use in the standard text, in descending order , the format isword1\nword2\n
...standard text
- there is no such thing as a standard text of a language, yet we need one for frequency analysis, in such absence we will use subjective judgement and be happy with itboard_string
- the ultimate goal of the experiment for a given language, represents dice walls, split into chunks of 6 gives you the dice, it is used to display the game
Some definitions are used loosely, while I had some training in linguistics, a better linguist will be quick to spot opportunities for philosophical debate, because what is a letter really? I'm not into this kind of debates. All useful comments are welcome though.
The project uses python logging to deliver two levels of information to log.txt
:
DEBUG
- all steps of the process are recordedINFO
- only the results meaningful to the end goal of finding the right string
You ca also get the breakdown of all results using report
subcommand.
To run tests:
python tests.py
This project is mainly used with human present so there aren't many needed. More will come when cleverer algorithms get introduced.
- Rewrite Solver, the main algorithm to Cython and see if performance gains are possible.
- Find out and document why running on PyPy is at least 10x slower (and getting slower as script runs, leaks?)
This code is MIT License where applicable (a portion was taken from Stack Overflow which is supposedly CC). The word lists and texts are different matter and to be honest I haven't figured out the licensing issues, they are mighty complex. I am doing my best not to steal any one's property but definitely will by accident. I guess the project will need to be restarted one day when I will have spent more time on these issues.