Smile-SA/elasticsuite

standard_edge_ngram_analyzer not working correctly

Denis2310 opened this issue · 4 comments

Preconditions

Magento Version : 2.4.6-ü4

ElasticSuite Version : 2.11.6

Environment : Developer

Third party modules :

Steps to reproduce

  1. Go to magento admin
  2. Create product with name "rotational"
  3. Set product name attribute to use standard_edge_ngram_analyzer
  4. Set elasticsuite -> search relevance -> exact match configuration -> Use default analyzer in exact matching filter query -> Yes
  5. Set elasticsuite -> spellchecking configuration -> term vectors configuration -> Use edge ngram analyzer in term vectors -> Yes

Expected result

  1. Magento admin -> elasticsuite -> system -> analysis -> select corresponding index -> select standard_edge_ngram_analyzer
  2. Type rotational keyword
  3. Tokens: rot, rota, rotat, rotati, rotatio, rotation, rotationa, rotational are shown

Actual result

  1. Magento admin -> elasticsuite -> system -> analysis -> select corresponding index -> select standard_edge_ngram_analyzer

  2. Type rotational keyword

  3. Tokens: rot, rota, rotat, rotat are shown

  4. if I test it with different keyword "rotationale" then tokens are: rot, rota, rotat, rotati, rotatio, rotation, rotationa, rotational

  5. if I test it with different keyword "rotationals" then tokens are: rot, rota, rotat.

  6. if I test it with different keyword "rotationa" then tokens are: rot, rota, rotat, rotati, rotatio, rotation, rotationa

Why it depends on keyword? It is not always that tokens are generated from 3 characters up to whole string.

min_gram = 3
max_gram = 20

2024-07-18 13_26_31-Analysis _ Magento Admin
2024-07-18 13_26_49-Analysis _ Magento Admin

Hi,

most probably because the word is stemmed before being sent to edge_n_gram filter : https://github.com/Smile-SA/elasticsuite/blob/2.11.x/src/module-elasticsuite-core/etc/elasticsuite_analysis.xml#L293

Hi @romainruaud what does that stemmed mean? So I should remove that from filter or create a new custom filter?

<filter ref="stemmer_override" />
<filter ref="stemmer" />

https://www.elastic.co/guide/en/elasticsearch/reference/current/stemming.html

You could try to create another analyzer equivalent to the standard_edge_ngram but without the stemmer filter.

Then check on the Analysis screen what will be the output of your words with this filter.

regards

Works fine thanks @romainruaud!