sql: support tsvector, tsquery based full text search
rohany opened this issue · 5 comments
Postgres has some advanced text search types and features that Django uses that we we don't support --
https://www.postgresql.org/docs/10/functions-textsearch.html
https://www.postgresql.org/docs/10/datatype-textsearch.html
This includes the tsvector
and regconfig
types and operations on them.
- parsers (tokenizing)
- the rest of the token types supported by Postgres's default parser: #97669
- dictionaries (stemming and stopwords - made by parameterizing a template)
- templates (templates for dictionaries)
- configurations (pair of parser and dictionary)
- regconfig datatype
- regdictionary datatype
-
tsvector
datastructure -
tsquery
datastructure -
to_tsvector
-
to_tsquery
-
phraseto_tsquery
-
plainto_tsquery
-
websearch_to_tsquery
-
tsquery_phrase
- accelerated searches with
@@
- GIN index on tsvectors
- weights
-
ts_rank
-
ts_rank_cd
-
strip(tsvector)
- length(tsvector)
-
setweight(tsvector)
-
array_to_tsvector(text[])
-
numnode(tsquery)
-
querytree(tsquery)
-
ts_delete(tsvector, text)
-
ts_headline
-
ts_rewrite
-
tsvector_to_array(tsvector)
-
unnest(tsvector)
- inverted join on tsvectors
And SQL operators for tsvector:
-
@@
(matching a tsvector and tsquery) - tsvector || tsvector (concatenation of tsquery)
- tsquery || tsquery (OR 2 tsqueries)
- tsquery && tsquery (AND 2 tsqueries)
- tsquery <-> tsquery (followedby 2 tsqueries()
- !! tsquery (negate a tsquery)
- tsquery @> tsquery (contains)
- tsquery <@ tsquery (contained-by)
Jira issue: CRDB-5464
Epic: CRDB-22357
We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
5 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!
@fqazi unsupported function array_to_tsvector
affects efcore.pg tests
@giangpham712 Can you confirm if we need anything else here, this might be one of the easier ones to address
@fqazi Related to full text search, we have multiple issues
Unsupported features:
array_to_tsvector
setweight
ts_delete
ts_headline
regconfig
regdictionary
length(tsvector)
numnode
querytree
ts_rank_cd
ts_rewrite
tsquery_phrase
strip
websearch_to_tsquery
unaccent(string, varchar)
Other issues:
unsupported comparison operator: to_tsvector("ContactTitle") @@ to_tsquery('owner') || to_tsquery('foo'): unsupported
binary operator: <tsquery> || <tsquery>
unsupported comparison operator: <tsquery> && <tsquery>
unsupported comparison operator: <tsquery> @> <tsquery>
unsupported comparison operator: <tsquery> <@ <tsquery>
unsupported binary operator: <tsquery> || <tsquery>
unsupported binary operator: <tsvector> || <tsvector>
SELECT !!to_tsquery('a & b')
FROM "Customers" AS c
LIMIT 1
returns error plainto_tsquery(): text-search query doesn't contain lexemes: a
@giangpham712 Let's skip tests with tsvector for now, let me follow up with the queries team for the issue you found