Serach issue - Products that have no text reference on the search
rqsakai opened this issue · 9 comments
In some cases, we are having search result items that seams unrelated to the search term
Preconditions
Adobe Commerce ver. 2.4.6-p3
ElasticSuite Open Source ver. 2.11.4.2
Openserach 1.2.4
Environment : Production
Steps to reproduce
Expected result
- Should show Z12N000G0-1
Actual result
- Sku Apple_MacBookPro_M2_pro_max_14 in on the page and on the first position
- Sku Z12N000G0-1 is on the last position on the page
File debug.md attached with all available configuration, search query and response.
debug.md
Hello @rqsakai,
It looks like you are using a custom theme. Are you sure that there is no any effects from settings of your theme? Also pay attention that on the search result page the Sort By
is set to Product Name A-Z
.
BR,
Vadym
Hi, thanks for you prompt response.
1 - i'm sure that the custom theme is not messing up with ES implementation, only layout updates
2 - Although the sorting is manually, the other product should not show on the page, why is it being returned by our filter?
I think the second point is actually our issue, and i cannot find out why the other product is being returned on the serach response
Hi @rqsakai , how are you mate ?
Are you able to replay your Elasticsearch query by adding "explain" : "true" in his body, at the same level of "from" or "size", and attach the produced results here ? I'll be glad to have a look.
Best regards !
Hi @romainruaud thank you for you response, here follows the request response from the production opensearch.
debug.explain.md
To keep it short, the query is matching portion of "Z12N000G0-1" in the "sku" field of the Macbook pro.
Most probably due to the presence of the following SKUs in the parent product. I guess thei're configuration SKUs.
"Z17K000N8",
"Z17G000NA",
"Z17K002J0",
"Z17K002HY",
"Z17G002TN",
"Z17G002NR",
"Z17G002TQ",
"Z17G002TR",
"Z17G002TS",
"Z17G002KP",
"Z17G002HS",
"Z17J000AJ",
"Z17J0015U",
"Z17J000AK"
Due to how the SKU field is handled, it's not surprising that the engine is matching on stuff like "000".
Imho, disabling the merging of SKU for childrens could resolve this.
But I invoke the almighty @rbayet regarding the matching part.
Regards
Hello @rqsakai,
Indeed by default the reference_word_delimiter
used in the reference
analyzer used for the sku attribute will split your sku based on digit/letter transition to allow partial sku matching while at the same time dropping the original full length "as is" sku.
As @romainruaud said, since your configurable product contains a lot of SKUs quite similar there's a pretty high chance that it's getting multiple partial hits on your original searched SKU, hence its score is higher than the simple product.
You could try preventing the merging of the children SKUs in your parent (configurable, bundle) products, but you would then lose the ability to find parent products when searching exclusively for a non-visible on its own child simple product.
Before doing that, I would suggest enable the following settings in Elasticsuite > Search Relevance
- Spellchecking configuration > Terms vectors configuration > [Experimental] Use all tokens from term vectors
- Spellchecking configuration > Terms vectors configuration > [Experimental] Use reference analyzer in term vectors
- Relevance configuration > Exact match configuration > [Experimental] Use default analyzer in exact matching filter query
Please be aware that "SKU matching" objectives are usually very remote to "words/keywords matching" in a fulltext search context, so there's always be a trade-off of some sort.
Regards,
Hi just to confirm even after enabling the vector configuration we het the exact same behavior, is there anything that can be done in this case?
Hi @rqsakai
yes you should probably prevent the "sku" field of the children products to be indexed in the parents : you should add SKU into this list via a di.xml file.