Cyrillic letters
yoyurec opened this issue · 13 comments
This is a bug, but I'm not sure where it happens yet.
Let me explain the current logic:
Step 1, fireSeqSearch reads all your notes and feed them to tantivy https://docs.rs/tantivy/latest/tantivy/ , and tantivy would do the search, including raking the hits
Step 2, fireSeqSearch adds highlights to the hits with a very naive algo. AFAIK tantivy doesn't tell us how it make its decisions.
Therefore, although the highlight, in this case, is terrible, I have a question to you. Do you think the top hit in this case is a real hit, or a false positive?
yes, page titles contains search word
Thank you, it confirmed my first guess. I'll try to fix that part.
Could you please provide some articles[1], so I could do some tests on it?
Thank you
[1]: with an open license like CC, or you have the copyright of it
search for word "статья" - https://www.google.com/search?q=%D1%81%D1%82%D0%B0%D1%82%D1%8C%D1%8F&oq=%D1%81%D1%82%D0%B0%D1%82%D1%8C%D1%8F&sourceid=chrome&ie=UTF-8
demo file: Тестовая статья.md
no rust (((
binary would be awesome! tnx
Hi, you can download the zip file at https://github.com/Endle/fireSeqSearch/releases/tag/dev_issue59 , this is compiled by MSYS2 (a bit too big)
The Windows binary should be suitable if you'd like to execute it with any Win-terminal. If not, I'll provide a MSVC binary tomorrow (currently GitHub Action is working on it)
same result, same letters wrong highlighted - every letter (not ok) + whole word (ok) ((
monkeyscript the same or should be updated also?
monkeyscript the same or should be updated also?
Nope. I'm 100% sure this is a bug on server-side
Seems that there're two bugs in my previous code, and I just fixed one :)
I added a mitigation that Tokenizer only applies to Chinese.
Please go to https://github.com/Endle/fireSeqSearch/releases/tag/dev_issue59 and try the v2 binary.
Thanks
Please go to https://github.com/Endle/fireSeqSearch/releases/tag/dev_issue59 and try the v2 binary.
Sorry, I just found a bug I just introduced
Please try the v3 binary. Thanks
It's weird, it worked fine on my computer
I've merged all the changes into master branch, and uploaded v4 to https://github.com/Endle/fireSeqSearch/releases/tag/dev_issue59
This time I'm not compiling for myself, but using GitHub action, which is the same as public releases.
If it still fails, can you try to run server with
RUST_BACKTRACE=1 RUST_LOG=debug
Sorry for letting you test so many times :(