VectorQuery + highlight() fails
Closed this issue · 7 comments
I need to perform VectorQuery with text filters plus highlight but it fails with: redisvl.exceptions.RedisSearchError: Error while searching: Property vector_distance is not in schema
The same works OK for FilterQuery.
Folllowing is a working example:
import sys
sys.path.append('.')
import numpy as np
from src.store.get_redis import getRedis
from redisvl.index import SearchIndex
from redisvl.query.filter import Tag, Num, FilterExpression, Text
from redisvl.query import VectorQuery, FilterQuery
def vector_highlight_issue():
r = getRedis()
r.flushall()
schema = {
"index": {
"name": "user_simple",
"prefix": "user_simple_docs",
},
"fields": [
{"name": "user", "type": "text"},
{"name": "credit_score", "type": "tag"},
{"name": "job", "type": "text"},
{"name": "age", "type": "numeric"},
# {"name": "vector_distance", "type": "numeric"},
{
"name": "user_embedding",
"type": "vector",
"attrs": {
"dims": 3,
"distance_metric": "cosine",
"algorithm": "flat",
"datatype": "float32"
}
},
]
}
data = [
{
'user': 'Sebastian Gurin',
'age': 1,
'job': 'engineer',
'credit_score': 'high',
'user_embedding': np.array([0.4, 0.3, 0.5], dtype=np.float32).tobytes()
},
{
'user': 'Sebastian Martinez',
'age': 2,
'job': 'doctor',
'credit_score': 'low',
'user_embedding': np.array([0.1, 0.1, 0.5], dtype=np.float32).tobytes()
},
{
'user': 'Maria Cristina Miños',
'age': 3,
'job': 'dentist',
'credit_score': 'medium',
'user_embedding': np.array([0.9, 0.9, 0.1], dtype=np.float32).tobytes()
}
]
index = SearchIndex.from_dict(schema)
index.set_client(r)
index.create(overwrite=True)
keys = index.load(data)
filter_expression = Text("user") % "Sebas*"
return_fields = ["user", "age", "job", "credit_score"]
query = VectorQuery(
vector=[0.1, 0.1, 0.5],
vector_field_name="user_embedding",
return_fields=return_fields,
num_results=3,
filter_expression=filter_expression,
)
query.highlight(fields=['user'])
# FilterQuery + highlight works fine:
# query = FilterQuery(
# return_fields=return_fields,
# num_results=3
# )
# query.set_filter(filter_expression)
# query.highlight(fields=['user'])
results = index.query(query)
print('LEN', len(results))
for doc in results:
print(doc)
if __name__ == "__main__":
vector_highlight_issue()Thanks for reporting. Will look into this today. Which version of redisvl and redis-py are you using?
Thanks for reporting. Will look into this today. Which version of redisvl and redis-py are you using?
redis 5.0.0
redisvl 0.3.9
This looks like a bug! There appears to be a problem with the underlying query in FT.SEARCH, where HIGHLIGHT doesn't work alongside a knn query. I'll talk to our search engineers and update this issue as we learn more.
This query works (full-text search with HIGHLIGHT):
127.0.0.1:6379> "FT.SEARCH" "user_simple" "@user:(Sebas*)" "RETURN" "4" "user" "age" "job" "credit_score" "DIALECT" "2" "LIMIT" "0" "3" HIGHLIGHT FIELDS 1 user
1) (integer) 2
2) "user_simple_docs:7a65714003254c1a988325d7192c90da"
3) 1) "user"
2) "<b>Sebastian</b> Gurin"
3) "age"
4) "1"
5) "job"
6) "engineer"
7) "credit_score"
8) "high"
4) "user_simple_docs:0aa7f56ab0604a388ebd73c69ce77e64"
5) 1) "user"
2) "<b>Sebastian</b> Martinez"
3) "age"
4) "2"
5) "job"
6) "doctor"
7) "credit_score"
8) "low"
This query also works (full-text search pre-filter and KNN query without HIGHLIGHT):
127.0.0.1:6379> "FT.SEARCH" "user_simple" "@user:(Sebas*)=>[KNN 3 @user_embedding $vector AS vector_distance]" "RETURN" "5" "user" "age" "job" "credit_score" "vector_distance" "SORTBY" "vector_distance" "ASC" "DIALECT" "2" "LIMIT" "0" "3" "params" "2" "vector" "\xcd\xcc\xcc=\xcd\xcc\xcc=\x00\x00\x00?"
1) (integer) 2
2) "user_simple_docs:0aa7f56ab0604a388ebd73c69ce77e64"
3) 1) "vector_distance"
2) "0"
3) "user"
4) "Sebastian Martinez"
5) "age"
6) "2"
7) "job"
8) "doctor"
9) "credit_score"
10) "low"
4) "user_simple_docs:7a65714003254c1a988325d7192c90da"
5) 1) "vector_distance"
2) "0.129070281982"
3) "user"
4) "Sebastian Gurin"
5) "age"
6) "1"
7) "job"
8) "engineer"
9) "credit_score"
10) "high"
But this query fails (full-text pre-filter and KNN query, with HIGHLIGHT):
127.0.0.1:6379> "FT.SEARCH" "user_simple" "@user:(Sebas*)=>[KNN 3 @user_embedding $vector AS vector_distance]" "RETURN" "5" "user" "age" "job" "credit_score" "vector_distance" "SORTBY" "vector_distance" "ASC" "DIALECT" "2" "LIMIT" "0" "3" "params" "2" "vector" "\xcd\xcc\xcc=\xcd\xcc\xcc=\x00\x00\x00?" HIGHLIGHT FIELDS 1 user
(error) Property `vector_distance` is not in schema
Aja, thanks! couple of comments:
- tried to add a
vector_distancemyself in the schema but then it fails with "duplicated schema fieldvector_distance - same error also happens when using
.summarize()
BTW amazing library, keep the good work!
Aja, thanks! couple of comments:
- tried to add a
vector_distancemyself in the schema but then it fails with "duplicated schema fieldvector_distance- same error also happens when using
.summarize()BTW amazing library, keep the good work!
Thanks @cancerberoSgx :) -- btw the error reported back here is just what comes from the Redis server and search library within the core. It's a bit of a red herring as the issue has nothing to do with the schema (but nice attempt at trying!!!)
We will bring this to product management from Redis to see what the status is of fixes. In the meantime, mind also sharing what version of redis you are using?
In the meantime, mind also sharing what version of redis you are using?
'redis_version': '7.4.1',
@tylerhutcherson @abrookins would be awesome if you share with me the follow up of this bug in other base projects if there's any issue or PR. Thanks
@cancerberoSgx We've merged a fix for this in our RediSearch module here: RediSearch/RediSearch#5623
The fix has been back-ported to various RediSearch versions and should go out in future releases of the module for users who install it directly. We'll also pick it up in Redis Stack, which bundles various modules with Redis. I don't have a timeline, so I recommend watching either of these repositories for release events to get notified of future releases.
Because this is a bug in a Redis module, I'm going to close this issue. Thanks for reporting!