/spelling-corrector

It's a Ruby implementation of Norvig Spelling Corrector plus Levenshtein distance fallback.

Primary LanguageRuby

Spelling Corrector

It's a Ruby implementation of Norvig Spelling Corrector plus Levenshtein distance fallback.

If Norvig algorithm doesn't find the correction, this implementation will look for the first occurrence (distance <= 8) of a similar word using Levenshtein distance.

known([word]) || known(edits1(word)) || known_edits2(word) || levenshtein(word) || ["NO SUGGESTION"]

Levenshtein costs: ins=2, del=2 and sub=1.

The Algorithm

Firstly, I recommend to read the Norvig explanation and Levenshtein distance then have a look at the tests (specs directory), they show how each method work, it helps the understading of the algorithm.

Most of the SpellingCorrector methods, should be private, I left them as public only to document (explain) them with tests.

How to use it

require "lib/spelling_corrector"

corrector = SpellingCorrector.new
corrector.correct "cen" => "can"

corrector.correct "unknownword" => "NO SUGGESTION"

Persisted Spelling Corrector

The PersistedSpellingCorrector and PersistedWordCollection are implementions using MongoDB (encapsulating the non-persisted implementations) to persisted the corrections and trained word collection.

Examples

In the examples directory, there are two examples, one using refinements and another with Sinatra to expose Spelling Corrector as an API.

Refinements

If you are using Ruby 2.0.0 we can use refine your string classes using the Spelling Corrector.

# examples/refinement_spelling_corrector.rb

using StringSpellingCorrectorRefinement

puts "cen".correct

webapp

The webapp example is deployed at Heroku, you can easily test it via curl or directly in the browser (shame on you).

curl spelling-corrector.herokuapp.com/correct/cen
=> can

Since it uses PersistedSpellingCorrector, to run it locally, you will need a MongoDB connection.

License

This code is licensed under:

MIT License GPL