lindera-morphology/lindera

Failed to create tokenizer on v0.22.0

RShirohara opened this issue · 3 comments

Summary

When using Lindera v0.22.0, the main thread panics at "range end index 4 out of range for slice of length 0" when creating a tokenizer.

Steps To Reproduce

  1. Add dependency to Cargo.toml

    [package]
    name = "lindera-build-test"
    version = "0.1.0"
    edition = "2021"
    
    [dependencies]
    lindera = { version = "0.22.0", features = ["ipadic"] }
  2. Add code to src/main.rs

    use lindera::{
        tokenizer::{DictionaryConfig, Tokenizer, TokenizerConfig},
        DictionaryKind,
    };
    
    fn main() {
        let config = TokenizerConfig {
            dictionary: DictionaryConfig {
                kind: Some(DictionaryKind::IPADIC),
                path: None,
            },
            ..TokenizerConfig::default()
        };
    
        // create tokenizer
        let tokenizer = Tokenizer::from_config(config).unwrap();
    
        // tokenize text
        let tokens = tokenizer.tokenize("たとえば私はこの文章を書く。").unwrap();
    
        // output tokens
        for token in tokens {
            println!("{}", token.text);
        }
    }

Expected result

Output tokens.

$ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.14s
     Running `target/debug/lindera-build-test`
たとえば
私
は
この
文章
を
書く
。

Actual result

Error occurred.

message: Thread 'main' panicked at 'range end index 4 out of range for slice of length 0', /home/rshirohara/.local/share/cargo/registry/src/github.com-1ecc6299db9ec823/lindera-core-0.22.0/src/connection.rs:12:53

`RUST_BACKTRACE=1` command outputs
$ RUST_BACKTRACE=1 cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.02s
     Running `target/debug/lindera-build-test`
thread 'main' panicked at 'range end index 4 out of range for slice of length 0', /home/rshirohara/.local/share/cargo/registry/src/github.com-1ecc6299db9ec823/lindera-core-0.22.0/src/connection.rs:12:53
stack backtrace:
   0: rust_begin_unwind
             at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/panicking.rs:575:5
   1: core::panicking::panic_fmt
             at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/core/src/panicking.rs:64:14
   2: core::slice::index::slice_end_index_len_fail_rt
             at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/core/src/slice/index.rs:77:5
   3: core::slice::index::slice_end_index_len_fail
             at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/core/src/slice/index.rs:69:9
   4: <core::ops::range::Range<usize> as core::slice::index::SliceIndex<[T]>>::index
             at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/core/src/slice/index.rs:409:13
   5: core::slice::index::<impl core::ops::index::Index<I> for [T]>::index
             at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/core/src/slice/index.rs:18:9
   6: lindera_core::connection::ConnectionCostMatrix::load
             at /home/rshirohara/.local/share/cargo/registry/src/github.com-1ecc6299db9ec823/lindera-core-0.22.0/src/connection.rs:12:53
   7: lindera_ipadic::connection
             at /home/rshirohara/.local/share/cargo/registry/src/github.com-1ecc6299db9ec823/lindera-ipadic-0.22.0/src/lib.rs:106:5
   8: lindera_ipadic::load_dictionary
             at /home/rshirohara/.local/share/cargo/registry/src/github.com-1ecc6299db9ec823/lindera-ipadic-0.22.0/src/lib.rs:91:22
   9: lindera_dictionary::load_dictionary_from_kind
             at /home/rshirohara/.local/share/cargo/registry/src/github.com-1ecc6299db9ec823/lindera-dictionary-0.22.0/src/lib.rs:104:35
  10: lindera::builder::load_dictionary
             at /home/rshirohara/.local/share/cargo/registry/src/github.com-1ecc6299db9ec823/lindera-0.22.0/src/builder.rs:58:13
  11: lindera::tokenizer::Tokenizer::from_config
             at /home/rshirohara/.local/share/cargo/registry/src/github.com-1ecc6299db9ec823/lindera-0.22.0/src/tokenizer.rs:221:26
  12: lindera_build_test::main
             at ./src/main.rs:16:21
  13: core::ops::function::FnOnce::call_once
             at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/core/src/ops/function.rs:507:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

System Information

  • rustup: 1.25.2 (2023-02-04)
  • toolchain:
    • channel: stable
    • host: x86_64-unknown-linux-gnu
    • rustc version: 1.67.1 (d5a82bbd2 2023-02-07)

Additional information

This issue was created using Google Translator and DeepL Translator.

@RShirohara
Thank you for letting me know.
I'll fix it as soon as possible.

@RShirohara
I've released v0.22.1. Please check it out.
Thanks again! 😃

@mosuka
I have confirmed that it works with v0.22.1.
Thanks! 😄