Casecommons/pg_search

English dictionary stem words are different from the example

thechrisoshow opened this issue · 4 comments

My problem started when I tried replicating the example in the readme:

class BoringTweet < ActiveRecord::Base
  include PgSearch::Model
  pg_search_scope :kinda_matching,
                  against: :text,
                  using: {
                    tsearch: {dictionary: "english"}
                  }
  pg_search_scope :literally_matching,
                  against: :text,
                  using: {
                    tsearch: {dictionary: "simple"}
                  }
end

sleepy = BoringTweet.create! text: "I snoozed my alarm for fourteen hours today. I bet I can beat that tomorrow! #sleepy"
sleeping = BoringTweet.create! text: "You know what I like? Sleeping. That's what. #enjoyment"
sleeper = BoringTweet.create! text: "Have you seen Woody Allen's movie entitled Sleeper? Me neither. #boycott"

BoringTweet.kinda_matching("sleeping") # => [sleepy, sleeping, sleeper]
BoringTweet.literally_matching("sleeping") # => [sleeping]

When I tried this doing a 'kinda_matching' search for 'sleeping' would only return the 'sleeping' record. Looking into it, it looks like the stems for sleepy, sleeping and sleeper are different:

select to_tsvector('sleepy');
=> 'sleepi':1

select to_tsvector('sleeping');
=> 'sleep':1

select to_tsvector('sleeper');
=>  'sleeper':1

Are there different versions of the 'english' catalog perhaps?

I'm running PostgreSQL 14.4 on aarch64-apple-darwin20.6.0, compiled by Apple clang version 12.0.5 (clang-1205.0.22.9), 64-bit

Interesting! I wrote these examples over a decade ago and haven't thought to keep checking them. I'm going to see what I can find out.

Indeed, I also get the same results. I'll update the examples!

Fixed!

Thanks! Happy it wasn't just me!