It's a Ruby implementation of Norvig Spelling Corrector plus Levenshtein distance fallback.
If Norvig algorithm doesn't find the correction, this implementation will look for the first occurrence (distance <= 8) of a similar word using Levenshtein distance.
known([word]) || known(edits1(word)) || known_edits2(word) || levenshtein(word) || ["NO SUGGESTION"]Levenshtein costs: ins=2, del=2 and sub=1.
Firstly, I recommend to read the Norvig explanation and Levenshtein distance then have a look at the tests (specs directory), they show how each method work, it helps the understading of the algorithm.
Most of the SpellingCorrector methods, should be private, I left them as public only to document (explain) them with tests.
require "lib/spelling_corrector"
corrector = SpellingCorrector.new
corrector.correct "cen" => "can"
corrector.correct "unknownword" => "NO SUGGESTION"The PersistedSpellingCorrector and PersistedWordCollection are implementions using MongoDB (encapsulating the non-persisted implementations) to persisted the corrections and trained word collection.
In the examples directory, there are two examples, one using refinements and another with Sinatra to expose Spelling Corrector as an API.
If you are using Ruby 2.0.0 we can use refine your string classes using the Spelling Corrector.
# examples/refinement_spelling_corrector.rb
using StringSpellingCorrectorRefinement
puts "cen".correctThe webapp example is deployed at Heroku, you can easily test it via curl or directly in the browser (shame on you).
curl spelling-corrector.herokuapp.com/correct/cen
=> canSince it uses PersistedSpellingCorrector, to run it locally, you will need a MongoDB connection.
This code is licensed under:
MIT License GPL