all semantic search items will have a score of 0
johnqiuwan opened this issue · 8 comments
I have followed the doc to do the Realtime search query. The setup is smooth, and the query has no error.
However, I noticed that all the result of the semantic search query items will have a score of 0
Is that normal ?
@johnqiuwan Thanks for reaching out! Could you provide more details on the problem and some reproducable code example?
Thank you for the quick reply!
Sample code to process the semantic search
<?php
namespace App\Services;
use RedisVentures\RedisVl\Vectorizer\Factory;
use RedisVentures\RedisVl\VectorHelper;
use RedisVentures\RedisVl\Query\VectorQuery;
use RedisVentures\RedisVl\Index\SearchIndex;
use Predis\Client;
class VectorQueryService
{
protected $factory;
protected $vectorProvider;
protected $vectorHelper;
protected $index;
public function __construct()
{
//
$this->factory = new Factory();
$this->vectorProvider =
$this->factory->createVectorizer('openai', env('TEXT_EMBEDDING_MODEL'));
$this->vectorHelper = new VectorHelper();
$this->index = new SearchIndex(new Client(), $this->schema());
$this->index->create();
}
private function schema()
{
$schema = [
'index' => [
'name' => 'idx:product',
'prefix' => 'laravel_hemes_database_product_by_id:',
'storage_type' => 'json',
],
'fields' => [
'id' => [
// 'path' => '$.id',
'type' => 'numeric',
],
'description' => [
// 'path' => '$.description',
'type' => 'text',
],
'vector' => [
// 'path' => '$.description_embeddings',
'type' => 'vector',
'dims' => 1536,
'datatype' => 'float32',
'algorithm' => 'flat',
'distance_metric' => 'cosine'
],
'image' => [
'type' => 'tag'
],
'slug' => [
// 'path' => '$.slug',
'type' => 'tag',
],
'product_name_text' => [
// 'path' => '$.product_name',
'type' => 'text',
],
'price' => [
// 'path' => '$.price',
'type' => 'numeric',
// 'sortable' => true,
],
'current_price' => [
// 'path' => '$.price',
'type' => 'numeric',
// 'sortable' => true,
],
'created' => [
// 'path' => '$.created_at',
'type' => 'numeric',
// 'sortable' => true,
],
'variant_options' => [
// 'path' => '$.variant_options',
'type' => 'tag',
],
'model' => [
// 'path' => '$.product_specifications.model',
'type' => 'text',
],
'category' => [
//'path' => '$.product_specifications.category',
'type' => 'tag',
],
'manufactory' => [
// 'path' => '$.product_specifications.manufactory[*]',
'type' => 'tag',
],
],
];
return $schema;
}
public function embed($text)
{
$embedding = $this->vectorProvider->embed($text);
$embedding = $embedding['data'][0]['embedding'];
if (!is_array($embedding)) {
$embedding = [$embedding];
}
return $embedding;
}
public function query($embedding)
{
// $embedding = [VectorHelper::toBytes($embedding)];
$query = new VectorQuery($embedding, 'vector', ['id', 'description', 'product_name_text', 'variant_options', 'model', 'category', 'manufactory', 'price', 'current_price', 'slug', 'image'], 10, true, 3);
return $this->index->query($query);
}
public function processResult($result)
{
return collect($result)->map(function ($product, $key) {
return collect($product)->transform(function ($value) {
return json_decode($value, true);
});
})->values()->toArray();
}
public function resultDto($result)
{
return collect($result)->map(function ($product, $key) {
$product['id'] = $product['id'][0];
$product['description'] = $product['description'][0];
$product['product_name'] = $product['product_name_text'][0];
$product['slug'] = $product['slug'][0];
return $product;
})->toArray();
}
}
Context:
- using redisjson to store the embedding data
- using openai text-embedding-3-small model to do the embedding (dimension 1536)
Already checked:
- The redisjson index created successfully
- The embedding data stored successfully in redisjson
- There is no errors when perform the search
Problem:
All the items returned will have a score of 0
Expected behavior
the score should not all 0
Versions:
- Predis: 2.2
- PHP 8.1
- Redis Stack 7.2.4
- mac
Additional context
If the vector value is updated in the redisjson, the search result will update accordingly. It seems the search is working but just all the scores are 0.
Does any updates on this @vladvildanov , thank you
@johnqiuwan By default Redis calculates scores based on terms frequency and it's occurrences in the document. Could you try to use other scorers available by default in Redis? It feels like it's something related to server-side
https://redis.io/docs/latest/develop/interact/search-and-query/advanced-concepts/scoring/
Thank you for the updates! I have looked the doc from the link you gave, but I am still not make sure why all return items has score of 0. It seems to me is a bug, but the items will update if the embedding updated. This is kind of strange to me , lol
I am appreciated your time and your amazing work.
Thank you so much! Let me know if you find something or feel free to contribute 👌
Hello there,
I'm facing the same issue, with pretty much the same schema than you @johnqiuwan. I've tried as you said @vladvildanov to force another scorer than default TFIDF, but I'm still getting 0 as result. BTW, this could be an idea for improving your library, just adding a scorer parameter to the VectorQuery class and then in getSearchArguments():
if ($this->scorer) {
$searchArguments = $searchArguments->scorer($this->scorer);
}
It would allow user to set a different scorer.
Anyway, as a workaround I was able to retrieve the vector_score in the returnFields:
$query = new VectorQuery(
$embedding,
'sentence_embedding',
['id', 'sentence', 'vector_score'], // Vector score is added here
scorer: 'BM25', // This would be nice
);
The difference is that the lower the vector_score is, the closer the sentence is to the result.
Hope this could help! And thanks for your work @vladvildanov
@tfortin Thank you for your feedback! I will take a look in near time