/trigrams

A Trigrams and similarity refinement for Ruby string

Primary LanguageRubyMIT LicenseMIT

Trigrams and Similarities for Strings

A trigrams and similarity refinement for Ruby string

What it does

The #trigrams method returns an array of trigrams for the string.

The #similarity method uses the trigrams for the current string and that passed in as an argument to return a Float value between 0.0 and 1.0 that quantifies the similarity between strings. A case insensitive comparison can be invoked with the case_insensitive parameter. The default is that the comparison be case sensitive.

How to use it

A Ruby refinement is a safe option to monkey-patching, particularly where you are modifying the behaviour of “someone else's” class – a core Ruby class, a Rails class, or a gem class.

Ruby 2.4 refinements documentation

By using the extension within a class, you can send the methods directly to any string.

Example

With:

module Test
  using StringSimilarityExtensions
  def self.trigrams(string)
    string.trigrams
  end
  def self.similarity(string1, string2, case_insensitive = false)
    string1.similarity(string2, case_insensitive)
  end
end

Test.trigrams("a")

Test.similarity("celebrities", "Celebrity")
Test.similarity("celebrities", "celebrity")

Test.similarity("celebrities", "Celebrity", true)
Test.similarity("celebrities", "celebrity", true)

Then:

2.4.4 :001 > module Test
2.4.4 :002?>     using StringSimilarityExtensions
2.4.4 :003?>     def self.trigrams(string)
2.4.4 :004?>         string.trigrams
2.4.4 :005?>       end
2.4.4 :006?>     def self.similarity(string1, string2, case_insensitive
2.4.4 :007?>         string1.similarity(string2, case_insensitive)
2.4.4 :008?>       end
2.4.4 :009?>   end
 => :similarity
2.4.4 :010 >
2.4.4 :011 >   Test.trigrams("a")
 => ["  a", " a "]
2.4.4 :012 >
2.4.4 :013 >   Test.similarity("celebrities", "Celebrity")
 => 0.29411764705882354
2.4.4 :014 > Test.similarity("celebrities", "celebrity")
 => 0.5714285714285714
2.4.4 :015 >
2.4.4 :016 >   Test.similarity("celebrities", "Celebrity", true)
 => 0.5714285714285714
2.4.4 :017 > Test.similarity("celebrities", "celebrity", true)
 => 0.5714285714285714
2.4.4 :018 >

Implementation

The trigram implementation is intended to reproduce the trigrams generated by the PostgreSQL pg_trgm extension.