standard_edge_ngram not working fine when using capital letters
Denis2310 opened this issue · 2 comments
Preconditions
Magento Version : 2.4.6-p4
ElasticSuite Version : 2.11.6
Environment : Development
Third party modules :
Steps to reproduce
- Create product called "Rotational viscometer: ViscoQC 100 H"
- Set standard_edge_ngram as default analyzer for product title
- Search for "viscoq", product should be shown because "viscoq" is part of product name
- Search for "ViscoQ" or "viscoQ", product should be shown because both are part of product name
Expected result
- Create product called "Rotational viscometer: ViscoQC 100 H"
- Set standard_edge_ngram as default analyzer for product title
- Search for "viscoq", product shown in result
- Search for "ViscoQ" or "viscoQ", product shown in result
Actual result
- Create product called "Rotational viscometer: ViscoQC 100 H"
- Set standard_edge_ngram as default analyzer for product title
- Search for "viscoq", product shown in result
- Search for "ViscoQ" or "viscoQ", product is not shown in result
I will close this, I have created plugin for search query to set it as lowercase.
use Magento\Search\Model\Query;
use Magento\Search\Model\QueryFactory;
class LowercaseQueries
{
public function afterGet(QueryFactory $subject, Query $result): Query
{
if (!$result->hasData('_lowercased')) {
$result->setQueryText(mb_strtolower($result->getQueryText()));
$result->setData('_lowercased', true);
}
return $result;
}
}
Hello @Denis2310,
Long story short, it could be related to some filter of your custom analyzer.
The main culprits I see would be either
- either the "word_delimiter" filter (either "word_delimiter" or "reference_word_delimiter") that will generate word parts based on case transition in addition of letter/digit transition.
- or the absence of the "lowercase" filter
Here what's happening on an unmodified standard analyzer
viscoQ becomes
- "viscoQ" and then "viscoq" (x2)
If both word_delimiter.preserve_original and word.delimiter.catenate_all are false, then
viscoQ becomes
- "visco Q" and then "visco q"
=> This will not match "viscoq".
If in addition the "lowercase" filter is missing, then
viscoQ becomes
- "visco Q"
=> This will not match "viscoq" either.
If you decided to customize the "standard_edge_ngram" analyzer and replace its "word_delimiter" filter by the "reference_word_delimiter" filter, it could be the original of the problem since
<filter name="word_delimiter" type="word_delimiter" language="default">
<generate_word_parts>true</generate_word_parts>
<catenate_words>true</catenate_words>
<catenate_numbers>true</catenate_numbers>
<catenate_all>true</catenate_all>
<split_on_case_change>true</split_on_case_change>
<split_on_numerics>true</split_on_numerics>
<preserve_original>true</preserve_original>
</filter>
<filter name="reference_word_delimiter" type="word_delimiter" language="default">
<generate_word_parts>true</generate_word_parts>
<catenate_words>false</catenate_words>
<catenate_numbers>false</catenate_numbers>
<catenate_all>false</catenate_all> <==
<split_on_case_change>true</split_on_case_change>
<split_on_numerics>true</split_on_numerics>
<preserve_original>false</preserve_original> <==
</filter>
If you didn't apply any changes, please check that you enabled the following experimental settings in search relevance
- Spellchecking configuration > Terms vectors configuration > [Experimental] Use all tokens from term vectors
- Spellchecking configuration > Terms vectors configuration > [Experimental] Use edge ngram analyzer in term vectors
- Relevance configuration > Exact match configuration > [Experimental] Use default analyzer in exact matching filter query
Regards,