lindera-morphology/lindera

What is the expected Lindera throughput (MB/s)?

fmassot opened this issue · 1 comments

It would be nice to have a rough idea of the throughput of the tokenizers on the README page.

I'm testing lindera currently and I'm using some code like this in my benches:

group
        .throughput(Throughput::Bytes(SOME_JPN_TEXT.len() as u64))
        .bench_with_input("japanese-tokenize-medium", SOME_JPN_TEXT, |b, text| {
            b.iter(|| process_tokens(&japanese_tokenizer, black_box(text)));
        });

I typically get between 6MB/s and 9MB/s with lindera IPAD dictionary.

And last but not least, thanks for maintaining this great library.

mosuka commented

@fmassot
That is a good idea. I will measure it later and post it in README.md.
Thanks!