AMythicDev/minus

Panics when searching for multi-byte characters

Closed this issue · 4 comments

When searching for strings containing double width characters, the program panics at this call to String::drain, since the endpoint is not on a char boundary. I haven't figured out how to fix it.

Steps to reproduce

Search 甲州 in the pager with the following code.

Dummy text in Japanese: text.txt

main.rs

use std::fmt::Write;

const STR: &str = include_str!("./text.txt");

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut pager = minus::Pager::new();
    write!(pager.lines, "{}", STR)?;
    minus::page_all(pager)?;
    Ok(())
}

Cargo.toml

# ...
minus = { version = "3.3", features = ["search", "static_output"] }

Panic backtrace

thread 'main' panicked at 'assertion failed: self.is_char_boundary(end)', /home/akitaki/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/string.rs:1572:9
stack backtrace:
0: rust_begin_unwind
    at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:493:5
1: core::panicking::panic_fmt
    at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/core/src/panicking.rs:92:14
2: core::panicking::panic
    at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/core/src/panicking.rs:50:5
3: alloc::string::String::drain
    at /home/akitaki/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/string.rs:1572:9
4: minus::search::find
    at /home/akitaki/.cargo/registry/src/github.com-1ecc6299db9ec823/minus-3.3.3/src/search.rs:194:9
5: minus::search::highlight_search
    at /home/akitaki/.cargo/registry/src/github.com-1ecc6299db9ec823/minus-3.3.3/src/search.rs:166:13
6: minus::init::static_paging
    at /home/akitaki/.cargo/registry/src/github.com-1ecc6299db9ec823/minus-3.3.3/src/init.rs:78:32
7: minus::static_pager::page_all
    at /home/akitaki/.cargo/registry/src/github.com-1ecc6299db9ec823/minus-3.3.3/src/static_pager.rs:86:13
8: indicatif_test::main::{{closure}}
    at ./src/main.rs:10:5
9: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
    at /home/akitaki/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/future/mod.rs:80:19
10: tokio::park::thread::CachedParkThread::block_on::{{closure}}
    at /home/akitaki/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/park/thread.rs:263:54
11: tokio::coop::with_budget::{{closure}}
    at /home/akitaki/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/coop.rs:106:9
12: std::thread::local::LocalKey<T>::try_with
    at /home/akitaki/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:272:16
13: std::thread::local::LocalKey<T>::with
    at /home/akitaki/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:248:9
14: tokio::coop::with_budget
    at /home/akitaki/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/coop.rs:99:5
15: tokio::coop::budget
    at /home/akitaki/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/coop.rs:76:5
16: tokio::park::thread::CachedParkThread::block_on
    at /home/akitaki/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/park/thread.rs:263:31
17: tokio::runtime::enter::Enter::block_on
    at /home/akitaki/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/runtime/enter.rs:151:13
18: tokio::runtime::thread_pool::ThreadPool::block_on
    at /home/akitaki/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/runtime/thread_pool/mod.rs:71:9
19: tokio::runtime::Runtime::block_on
    at /home/akitaki/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/runtime/mod.rs:452:43
20: indicatif_test::main
    at ./src/main.rs:11:5
21: core::ops::function::FnOnce::call_once
    at /home/akitaki/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Thanks for putting up the issue. A new version of minus is in the works, if you want to fix it, you should probably do it in the 4.0 branch, which will be the next major release of minus

After some investigation,

  • It seems like the panic is caused by the mixed usage of byte indices and unicode character indices, which are not always the same. Any multibyte character (for example à) can trigger the bug, so it also happens on languages with accented letters like Spanish.
  • On the master branch,
    • The panic can be solved by replacing let truncate = x + query.char_indices().count(); with let truncate = x + query.len();, because String::drain requires a byte range, not a character index range.
    • After fixing the panic, the cursor positioning is wrong. I'm not familiar enough (yet) with the codebase enough to fix this.
  • On the 4.0 branch, the panic occurs before the pager is shown, due to a call to split_at() with confused index. I've used the textwrap library to resolve it, but I'm not sure if we should include that as a dependency. It seems better to just delegate line wrapping to mature libraries that can deal with arbitrary Unicode strings, instead of implementing as part of the minus library itself.
    pub(crate) fn break_line(mut line: &str, cols: usize) -> Vec<String> {
          textwrap::wrap(line, cols)
              .into_iter()
              .map(|line| line.to_string())
              .collect()
    }
Panic backtrace on branch 4.0
thread 'main' panicked at 'byte index 86 is not a char boundary; it is inside '張' (bytes 84..87) of `織田信長は、織田弾正忠家の当主・織田信秀の子に生まれ、尾張(愛`', /home/akitaki/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/str/mod.rs:565:13
stack backtrace:
0: rust_begin_unwind
    at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:493:5
1: core::panicking::panic_fmt
    at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/core/src/panicking.rs:92:14
2: core::str::slice_error_fail
3: core::str::<impl str>::split_at
    at /home/akitaki/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/str/mod.rs:565:13
4: minus::line::Line::break_line
    at /home/akitaki/code/minus/src/line.rs:129:36
5: minus::line::Line::new
    at /home/akitaki/code/minus/src/line.rs:112:26
6: minus::line::Line::from_str
    at /home/akitaki/code/minus/src/line.rs:148:24
7: minus::Pager::prepare
    at /home/akitaki/code/minus/src/lib.rs:322:45
8: minus::static_pager::page_all
    at /home/akitaki/code/minus/src/static_pager.rs:62:5
9: indicatif_test::main
    at ./src/main.rs:6:5
10: core::ops::function::FnOnce::call_once
    at /home/akitaki/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

I am doing yet another rewrite using the textwrap crate, hope it will soon fix the panic

I have fixed this in 4.0 branch. Closing this issue.