A brute force anagram finder, a teaching tool, and a one gram pun. For more information, please consult Lisa Simpson.
Because programming in Python should be fun! I frequently need teaching examples, and this seems like a good one: the topic is simple and familiar, with the implementation revealing some design choices and difficulties.
A number of people with whom I work use the Internet Anagram Server, and apparently we are not alone. It seems always to be down or overwhelmed by requests. To minimize its CPU usage, it works with a small list of common words. I'm interested in fewer, longer, and more interesting words.
For example, embrace inclusivity
is an anagram for
unmiscable veracity
cystic nerve bulimia
and a few other phrases --- but who knew?
Many teaching examples are not all that useful for improving one's programming. They are minimum working examples of .. something .., but programming is not best learned in MWE size bites, just as human language is not learned one word at a time. If you studied Spanish and learned that "me | gusta | la | pitón" is "to me | is pleasing | the | python," you are never going to learn Spanish. You might as well give up now.
-
A standard distro of Python. This program does not use external libraries, and the source is cut into a few files simply to give it a straightforward organization.
-
If you do not like the provided dictionaries (taken from the Webster's Second International Dictionary), you can use the
dictbuilder.py
file to create your own. If you run the program on Linux or Mac OS, the program should find the system dictionaries without your having to do anything extra. I have been told that Windows is a BYODictionary experience.
usage: anagrammar [-h] [-d DICTIONARY] [-m MIN_LEN]
[--nice {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19}] [-t CPU_TIME] [-v VERBOSE]
[-z]
phrase
A brute force anagram finder.
positional arguments:
phrase The phrase. If it contains spaces, it must be in quotes.
optional arguments:
-h, --help show this help message and exit
-d DICTIONARY, --dictionary DICTIONARY
Name of the dictionary of words, or a pickle of the dictionary.
-m MIN_LEN, --min-len MIN_LEN
Minimum length of any word in the anagram. The default is 3.
--nice {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19}
Niceness may affect execution time. The default is 7, which is about twice as
nice as the average program.
-t CPU_TIME, --cpu-time CPU_TIME
Set a maximum number of CPU seconds for execution. The default is 60.
-v VERBOSE, --verbose VERBOSE
Set the logging level on a scale from 10 to 50. The default is 35, which only
logs errors.
-z, --zap If set, remove old logfile[s].
How about george flanagin as the phrase? First, source the anagram.bash
file, and you will then be able to just type anagram
to run the program. The
output will appear on the screen, which is not always useful if there are a lot
of anagrams to your phrase.
Here is what the output looks like when we route the anagrams to gkf.txt
.
anagram -d words -m 3 georgeflanagin > gkf.txt
--cpu-time 60 --dictionary words --min-len 3 --nice 7 --phrase georgeflanagin --verbose 35 --zap False
0.857 : 262144
1.496 : 524288
2.031 : 786432
2.42user 0.03system 0:02.46elapsed 99%CPU (0avgtext+0avgdata 110416maxresident)k
0inputs+6968outputs (0major+24128minor)pagefaults 0swaps
In the gkf.txt
file, the most interesting anagram to me is long fearing age
,
and at the bottom of the file we see the "stats:"
Tree had 194092 paths.
Tree had 39147 distinct paths.
tries=803821 deadends=414962
The apparent speed is mainly due to running the program on a reasonably
fast CPU. The big number, 803821
, is the number of edges in the
exhaustive graph. 414962
is the number of edges whose leaves are
a value (combination of letters) that do not spell a word in the dictionary.
After the removing the dead-end edges, there were 194092
paths (NOTE: a path
consists of at least one edge, but generally more.) from the root to a valid
leaf.
Of course, many are duplicates: long -> fearing -> age is the same as
age -> long -> fearing. After removing the duplicates,
there were only 39147
anagrams of the original phrase.
You will notice in the output that the results are written this way, one anagram per line.
(('fragile',), ('gag',), ('enon', 'neon', 'none'))
Many small groups of characters spell more than one word, such
as ('enon', 'neon', 'none')
, and this allows for a more
compact representation than something like:
('fragile', 'gag', 'enon')
('fragile', 'gag', 'neon')
('fragile', 'gag', 'none')
There are no string operations in the code; it is all integer math.
The assumption in dictbuilder.py
is that the input is a file of words, white
space delimited. You type in something like:
python3 dictbuilder.py [-b] -i /path/to/your/file nameyouwanttouse
The result will be a file named nameyouwanttouse.numbers
that you can
reference in your calculation as nameyouwanttouse
. If you use the -b
option, the dictionary will ignore the prebuilt lists of 3, 4, and 5 letter
words, and use what's in the dictionary.
This is why God gave you the ability to fork the repo. The dictionaries are
built and read entirely within the file dictbuilder.py
, so you can change
this. There are only about 100 lines of code that do the work.
OK, first I have heard worse and I have committed much bigger sins. Recently. My best professional friend once described my programming as "entirely adequate." A written performance review once included the words "the only thing George knows how to do is write a compiler." Both statements are probably correct.
If it bothers you, you can experiment with AutoPEP8. It is excellent.