Before anything else, the answer is 78482 and the resulting code is CausesChallengeTrie.java.
I first went to the Wikipedia page and learned how the levenshtein distance is calculated.
I went straight to my editor to and started implementing the standard (Robert-Fisher) algorithm in Ruby. I did not do any research, I wanted to see how a straight forward, head-on approach would turn out. I saw the puzzle's data set is over 260000 words long but I went ahead.
This resulted in causes_challenge.rb. The code is pretty basic and crude. As expected this first approach did not work out. While correctly written, the performance using either ruby 1.9 or jruby is sub-optimal for this problem. I didn't wait for it to complete.
I had to find a better algorithm and switch to a faster language.
My language of choice for compute intensive tasks is Java (I never got to working with C/C++) so I switched to Java.
I had to go online to find a new approach/algorithm to this problem (I didn't study CS, I have to look this stuff up).
On Stackoverflow, I found someone with a similar need and along with his solution, using Trie structures. I found it elegant and I was happy to understand the logic and the trie structure quickly. I implemented it in Java.
This resulted in CausesChallengeTrie.java. In my first test I solved the problem in 12 minutes. Too slow (I have all weekend). So I went back online and looked for alternative approaches.
I should mention that I'm running all my tests on my older 1.8ghz Macbook Air, I do not have very performant hardware.
I had read about upcoming Lucene 4.0's greatly improved fuzzy search performance. Lucene will soon be using Levenshtein Automata. I didn't and still don't know how they work, but on paper they can find a leveinstein distance very quickly.
I couldn't find a standalone implementation of Levenshtein Automata so I first tried using Lucene.
This resulted in CausesChallengeLucene.java. This didn't work well enough. I got the correct answer but it was taking over 15 minutes because of Lucene's overhead.
I did find a standalone implementation of Levenshtein Automata.
This resulted in CausesChallengeAutomata.java. It worked but again it was too slow. I tweeked the automaton's source trying to make it perform better, I got some results but I was still over 10 minutes. I was not motivated to tweak the code further.
Well maybe not quite the holy grail, but good enough for me. I went back to the more elegant CausesChallengeTrie.java. I read up on writing performant Java code and how to tweak the JVM for my problem at hand.
After a few iterations, making subtle changes, I brought the execution time down from 12 minutes to 1.5 minutes. That is an improvement of 800%. I stopped there. That is good enough for me.
The Java Trie version can be compiled:
javac -O CausesChallengeTrie.java
and run:
time java -Xms720m -Xmx720m -server -XX:MaxNewSize=60m -XX:NewSize=60m -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:MaxPermSize=48M -XX:PermSize=48M -XX:+AggressiveOpts -XX:+OptimizeStringConcat -XX:MaxTenuringThreshold=0 -XX:NewRatio=2048 -XX:SoftRefLRUPolicyMSPerMB=10000 CausesChallengeTrie challenge_word_list.tmp.txt