[BUG] Improve template search functionality
Closed this issue · 4 comments
When searching templates in ARCitect, the search results are sometimes suboptimal.
OS and framework information:
- OS: Ubuntu 22.04
- ARCitect version: v0.0.40
Describe the bug
Example:
- Searching by template name;
- There are multiple templates in the primary list titled
ENA - XXXX
- Typing
ENA
in search bar turns up 0 results - Typing
ENA -
in search bar now turns up just one of the results
- There are multiple templates in the primary list titled
Screenshots example 1, template search
For several templates named ENA - ...
we see there are several in the llist of templates:
However when we enter ENA
in the search box we get no results:
And when we type ENA -
in the search bar we get one of them as a result:
@Freymaurer can you move this to Swate?
Hey! Could you pls open two issues for this? As the two problems you describe are not related to each other. Feel free to keep this one for Template search and open another one for term search.
done 👍
The reason behind this behavior is our search algorithm. We use sorensen dice on string bigrams. A lot of fancy words for "we look for similiarity and the more similiar the two strings we compare the higher the score", and to filter out unfit results we apply a threshold. In your example "ENA - " has actually more similiarity to SRA - Sequencing
than to the longer ENA names. For example in "ENA - Gene promoter annotated sequence", we have ~30 missmatch characters. In "SRA - Sequencing" we have only 11 missmatch characters. This very flexible calculation allows for semi-similiar result search. To avoid your described issues we know adjust the score as follows:
- Increase score drastically if it starts with query (+0.5)
- Increase score if contains query (+0.3)
Note
Threshold is 0.3