cockroachdb/cockroach

sql: support tsvector, tsquery based full text search

rohany opened this issue · 5 comments

Postgres has some advanced text search types and features that Django uses that we we don't support --
https://www.postgresql.org/docs/10/functions-textsearch.html
https://www.postgresql.org/docs/10/datatype-textsearch.html

This includes the tsvector and regconfig types and operations on them.

  • parsers (tokenizing)
  • the rest of the token types supported by Postgres's default parser: #97669
  • dictionaries (stemming and stopwords - made by parameterizing a template)
  • templates (templates for dictionaries)
  • configurations (pair of parser and dictionary)
  • regconfig datatype
  • regdictionary datatype
  • tsvector datastructure
  • tsquery datastructure
  • to_tsvector
  • to_tsquery
  • phraseto_tsquery
  • plainto_tsquery
  • websearch_to_tsquery
  • tsquery_phrase
  • accelerated searches with @@
  • GIN index on tsvectors
  • weights
  • ts_rank
  • ts_rank_cd
  • strip(tsvector)
  • length(tsvector)
  • setweight(tsvector)
  • array_to_tsvector(text[])
  • numnode(tsquery)
  • querytree(tsquery)
  • ts_delete(tsvector, text)
  • ts_headline
  • ts_rewrite
  • tsvector_to_array(tsvector)
  • unnest(tsvector)
  • inverted join on tsvectors

And SQL operators for tsvector:

  • @@ (matching a tsvector and tsquery)
  • tsvector || tsvector (concatenation of tsquery)
  • tsquery || tsquery (OR 2 tsqueries)
  • tsquery && tsquery (AND 2 tsqueries)
  • tsquery <-> tsquery (followedby 2 tsqueries()
  • !! tsquery (negate a tsquery)
  • tsquery @> tsquery (contains)
  • tsquery <@ tsquery (contained-by)

Jira issue: CRDB-5464
Epic: CRDB-22357

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
5 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@fqazi unsupported function array_to_tsvector affects efcore.pg tests

@giangpham712 Can you confirm if we need anything else here, this might be one of the easier ones to address

@fqazi Related to full text search, we have multiple issues

Unsupported features:

array_to_tsvector
setweight
ts_delete
ts_headline
regconfig
regdictionary
length(tsvector)
numnode
querytree
ts_rank_cd
ts_rewrite
tsquery_phrase
strip
websearch_to_tsquery
unaccent(string, varchar)

Other issues:

unsupported comparison operator: to_tsvector("ContactTitle") @@ to_tsquery('owner') || to_tsquery('foo'): unsupported 
binary operator: <tsquery> || <tsquery>
unsupported comparison operator: <tsquery> && <tsquery>
unsupported comparison operator: <tsquery> @> <tsquery>
unsupported comparison operator: <tsquery> <@ <tsquery>
unsupported binary operator: <tsquery> || <tsquery>
unsupported binary operator: <tsvector> || <tsvector>
SELECT !!to_tsquery('a & b')
FROM "Customers" AS c
LIMIT 1

returns error plainto_tsquery(): text-search query doesn't contain lexemes: a

@giangpham712 Let's skip tests with tsvector for now, let me follow up with the queries team for the issue you found