meilisearch/charabia

Implement the `CharNormalizer` trait on the `LowercaseNormalizer` struct

Kerollmops opened this issue · 0 comments

It would be much better to directly implement the CharNormalizer trait instead of the Normalizer trait on the LowercaseNormalizer structure. This way, the char map will be created if necessary. Currently, the char map could be missing. You can find an example of how to implement it in the ChineseNormalizer.

fn normalize<'o>(&self, mut token: Token<'o>, _options: &NormalizerOption) -> Token<'o> {
match token.char_map.take() {
Some(char_map) => {
let mut new_lemma = String::with_capacity(token.lemma.len());
let mut new_char_map = Vec::with_capacity(char_map.len());
let mut s = token.lemma.as_ref();
for (orig_len, new_len) in char_map {
let (chunk, tail) = s.split_at(new_len as usize);
s = tail;
let lowercased_chunk = chunk.to_lowercase();
new_char_map.push((orig_len, lowercased_chunk.len() as u8));
new_lemma.push_str(&lowercased_chunk);
}
token.lemma = Cow::Owned(new_lemma);
token.char_map = Some(new_char_map);
}
None => token.lemma = Cow::Owned(token.lemma().to_lowercase()),
}
token
}