cozy/cozy.github.io

Better tokenizer

Opened this issue · 0 comments

fromOldClient does not return any result in docs.cozy.io search.
CozyClient.fromOldClient does return the result.

IMHO this is caused by the tokenizer, that considers the point not to be separating two words which causes "fromOldClient" not to be a word.

I think we tweaked the tokenizer because we needed doctypes to be returned as is, that is : the dot should not split doctypes like "io.cozy.bills" but should split CozyClient.fromOldClient.

Since doctypes can be inferred as having at least 2 dots, and no starting capital, we could maybe improve the tokenizer to support both cases.

See

tokenizer: "[^a-z\u0430-\u044F\u04510-9\\-\\.]"