gyscos/cursive

crash on showing specially crafted string

alexanderkjall opened this issue · 3 comments

First: is this a bug report? A suggestion? Or asking for help?

bug report

Problem description

My fuzzing found an panic inside cursive itself when displaying a string supplied from an untrusted source:

thread '<unnamed>' panicked at 'byte index 1 is not a char boundary; it is inside '\u{85}' (bytes 0..2) of `
-`', /home/capitol/projects/cursive/cursive-core/src/utils/lines/spans/segment.rs:24:24

Can be replicated with with program:

use cursive::views::{Dialog, TextView};
use std::str;

fn main() {
    let d: Vec<u8> = vec![194, 133, 45, 127, 29, 127, 127];
    let str = str::from_utf8(&d);
    if let Ok(s) = str {
        // Creates the cursive root - required for every application.
        let mut siv = cursive::default();

        // Creates a dialog with a single "Quit" button
        siv.add_layer(Dialog::around(TextView::new(s))
            .title("Cursive")
            .button("Quit", |s| s.quit()));

        siv.run();
    } else {
        println!("not valid utf8");
    }
}

Environment

  • Operating system: linux (ubuntu 20.04)
  • Backend used: ncurses (the default one) but looks like it applies to all of them
  • Current locale (run locale in a terminal).
$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=nb_NO.UTF-8
LC_TIME=nb_NO.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=nb_NO.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=nb_NO.UTF-8
LC_NAME=nb_NO.UTF-8
LC_ADDRESS=nb_NO.UTF-8
LC_TELEPHONE=nb_NO.UTF-8
LC_MEASUREMENT=nb_NO.UTF-8
LC_IDENTIFICATION=nb_NO.UTF-8
LC_ALL=
  • Cursive version (from crates.io, from git, ...): replicated in git an 0.15

Hi, and thank you for the detailed report! Seems like we're doing something wrong there, will investigate.

Ah yeah we totally assume that any line-breaking character is 1-byte in utf-8... smart...

I pushed a tentative fix (and added this as a text case). Though now I see a new problem: ncurses behaves differently from other backends like termion/crossterm when attempting to print control characters like \u{7f} or \u{1d}:

Ncurses tries to print something:
Screenshot from 2020-08-26 15-20-59

Crossterm just ignores it:
Screenshot from 2020-08-26 15-20-49

UnicodeWidth returns a 0-width for these control characters, so the correct behaviour would be not to print them.

But rather than patching ncurses to not print them, it may be easier to sanitize the user input and remove these characters from any provided string?

On the other hand, there is a longer-term plan to bring something like agavrilov/cursive_buffered_backend in cursive itself, keeping an internal character grid before flushing to the backend. If we had that, it would be easy to ignore any zero-width grapheme.

I can't really think of any use case where you would want to print those kind of control characters "raw", maybe if want to build a hex editor, but even then you need to sanitize/replace unprintable characters with something else.