What is the expected Lindera throughput (MB/s)?
fmassot opened this issue · 1 comments
fmassot commented
It would be nice to have a rough idea of the throughput of the tokenizers on the README page.
I'm testing lindera currently and I'm using some code like this in my benches:
group
.throughput(Throughput::Bytes(SOME_JPN_TEXT.len() as u64))
.bench_with_input("japanese-tokenize-medium", SOME_JPN_TEXT, |b, text| {
b.iter(|| process_tokens(&japanese_tokenizer, black_box(text)));
});
I typically get between 6MB/s and 9MB/s with lindera IPAD dictionary.
And last but not least, thanks for maintaining this great library.