wilsonzlin/minify-html

Undefined behavior when minifying style containing non-utf-8 content

Nemo157 opened this issue · 2 comments

Test case:

fn main() {
    let input = b"<style>\xff</style>";
    let cfg = minify_html::Cfg {
        minify_css: true,
        ..minify_html::Cfg::default()
    };
    let output = minify_html::minify(input, &cfg);
    println!("{}", String::from_utf8(output).unwrap());
}

When run with cargo miri run outputs:

error: Undefined Behavior: entering unreachable code
  --> /home/nemo157/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/str/validations.rs:49:23
   |
49 |     let y = unsafe { *bytes.next().unwrap_unchecked() };
   |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ entering unreachable code
   |
   = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
   = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
   = note: BACKTRACE:
   = note: inside `core::str::validations::next_code_point::<'_, std::slice::Iter<'_, u8>>` at /home/nemo157/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/str/validations.rs:49:23: 49:54
   = note: inside `<std::str::Chars<'_> as std::iter::Iterator>::next` at /home/nemo157/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/str/iter.rs:44:18: 44:49
   = note: inside `<std::str::CharIndices<'_> as std::iter::Iterator>::next` at /home/nemo157/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/str/iter.rs:140:15: 140:31
   = note: inside `<std::str::pattern::MultiCharEqSearcher<'_, [closure@nom::character::complete::multispace1<&str, nom::error::Error<&str>>::{closure#0}]> as std::str::pattern::Searcher<'_>>::next` at /home/nemo157/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/str/pattern.rs:663:31: 663:39
   = note: inside `<std::str::pattern::MultiCharEqSearcher<'_, [closure@nom::character::complete::multispace1<&str, nom::error::Error<&str>>::{closure#0}]> as std::str::pattern::Searcher<'_>>::next_match` at /home/nemo157/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/str/pattern.rs:234:19: 234:30
   = note: inside `<std::str::pattern::CharPredicateSearcher<'_, [closure@nom::character::complete::multispace1<&str, nom::error::Error<&str>>::{closure#0}]> as std::str::pattern::Searcher<'_>>::next_match` at /home/nemo157/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/str/pattern.rs:754:13: 754:32
   = note: inside `core::str::<impl str>::find::<'_, [closure@nom::character::complete::multispace1<&str, nom::error::Error<&str>>::{closure#0}]>` at /home/nemo157/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/str/mod.rs:1147:9: 1147:45
   = note: inside `<&str as nom::traits::InputTakeAtPosition>::split_at_position1_complete::<[closure@nom::character::complete::multispace1<&str, nom::error::Error<&str>>::{closure#0}], nom::error::Error<&str>>` at cargo/registry/src/index.crates.io-6f17d22bba15001f/nom-7.1.3/src/traits.rs:683:11: 683:31
   = note: inside `nom::character::complete::multispace1::<&str, nom::error::Error<&str>>` at cargo/registry/src/index.crates.io-6f17d22bba15001f/nom-7.1.3/src/character/complete.rs:700:3: 706:4
   = note: inside `<fn(&str) -> std::result::Result<(&str, &str), nom::internal::Err<nom::error::Error<&str>>> {nom::character::complete::multispace1::<&str, nom::error::Error<&str>>} as std::ops::FnMut<(&str,)>>::call_mut - shim(fn(&str) -> std::result::Result<(&str, &str), nom::internal::Err<nom::error::Error<&str>>> {nom::character::complete::multispace1::<&str, nom::error::Error<&str>>})` at /home/nemo157/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:166:5: 166:75
   = note: inside `<fn(&str) -> std::result::Result<(&str, &str), nom::internal::Err<nom::error::Error<&str>>> {nom::character::complete::multispace1::<&str, nom::error::Error<&str>>} as nom::internal::Parser<&str, &str, nom::error::Error<&str>>>::parse` at cargo/registry/src/index.crates.io-6f17d22bba15001f/nom-7.1.3/src/internal.rs:325:5: 325:12
   = note: inside `<(fn(&str) -> std::result::Result<(&str, &str), nom::internal::Err<nom::error::Error<&str>>> {nom::character::complete::multispace1::<&str, nom::error::Error<&str>>}, for<'a> fn(&'a str) -> std::result::Result<(&'a str, &'a str), nom::internal::Err<nom::error::Error<&'a str>>> {css_minify::parsers::utils::parse_comment}) as nom::branch::Alt<&str, &str, nom::error::Error<&str>>>::choice` at cargo/registry/src/index.crates.io-6f17d22bba15001f/nom-7.1.3/src/branch/mod.rs:135:15: 135:42
   = note: inside closure at cargo/registry/src/index.crates.io-6f17d22bba15001f/nom-7.1.3/src/branch/mod.rs:50:15: 50:26
   = note: inside `<[closure@nom::branch::alt<&str, &str, nom::error::Error<&str>, (fn(&str) -> std::result::Result<(&str, &str), nom::internal::Err<nom::error::Error<&str>>> {nom::character::complete::multispace1::<&str, nom::error::Error<&str>>}, for<'a> fn(&'a str) -> std::result::Result<(&'a str, &'a str), nom::internal::Err<nom::error::Error<&'a str>>> {css_minify::parsers::utils::parse_comment})>::{closure#0}] as nom::internal::Parser<&str, &str, nom::error::Error<&str>>>::parse` at cargo/registry/src/index.crates.io-6f17d22bba15001f/nom-7.1.3/src/internal.rs:325:5: 325:12
   = note: inside closure at cargo/registry/src/index.crates.io-6f17d22bba15001f/nom-7.1.3/src/multi/mod.rs:63:13: 63:31
   = note: inside `css_minify::parsers::utils::parse_useless` at cargo/registry/src/index.crates.io-6f17d22bba15001f/css-minify-0.3.1/src/parsers/utils.rs:18:5: 18:52
   = note: inside `<for<'a> fn(&'a str) -> std::result::Result<(&'a str, std::vec::Vec<&'a str>), nom::internal::Err<nom::error::Error<&'a str>>> {css_minify::parsers::utils::parse_useless} as std::ops::FnMut<(&str,)>>::call_mut - shim(for<'a> fn(&'a str) -> std::result::Result<(&'a str, std::vec::Vec<&'a str>), nom::internal::Err<nom::error::Error<&'a str>>> {css_minify::parsers::utils::parse_useless})` at /home/nemo157/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:166:5: 166:75
   = note: inside `<for<'a> fn(&'a str) -> std::result::Result<(&'a str, std::vec::Vec<&'a str>), nom::internal::Err<nom::error::Error<&'a str>>> {css_minify::parsers::utils::parse_useless} as nom::internal::Parser<&str, std::vec::Vec<&str>, nom::error::Error<&str>>>::parse` at cargo/registry/src/index.crates.io-6f17d22bba15001f/nom-7.1.3/src/internal.rs:325:5: 325:12
   = note: inside `<(for<'a> fn(&'a str) -> std::result::Result<(&'a str, std::vec::Vec<&'a str>), nom::internal::Err<nom::error::Error<&'a str>>> {css_minify::parsers::utils::parse_useless}, [closure@nom::multi::many0<&str, css_minify::structure::CssEntity, nom::error::Error<&str>, for<'a> fn(&'a str) -> std::result::Result<(&'a str, css_minify::structure::CssEntity), nom::internal::Err<nom::error::Error<&'a str>>> {css_minify::parsers::css_entity::parse_entity}>::{closure#0}], for<'a> fn(&'a str) -> std::result::Result<(&'a str, std::vec::Vec<&'a str>), nom::internal::Err<nom::error::Error<&'a str>>> {css_minify::parsers::utils::parse_useless}) as nom::sequence::Tuple<&str, (std::vec::Vec<&str>, std::vec::Vec<css_minify::structure::CssEntity>, std::vec::Vec<&str>), nom::error::Error<&str>>>::parse` at cargo/registry/src/index.crates.io-6f17d22bba15001f/nom-7.1.3/src/sequence/mod.rs:236:18: 236:49
   = note: inside closure at cargo/registry/src/index.crates.io-6f17d22bba15001f/nom-7.1.3/src/sequence/mod.rs:278:15: 278:25
   = note: inside `<[closure@nom::sequence::tuple<&str, (std::vec::Vec<&str>, std::vec::Vec<css_minify::structure::CssEntity>, std::vec::Vec<&str>), nom::error::Error<&str>, (for<'a> fn(&'a str) -> std::result::Result<(&'a str, std::vec::Vec<&'a str>), nom::internal::Err<nom::error::Error<&'a str>>> {css_minify::parsers::utils::parse_useless}, [closure@nom::multi::many0<&str, css_minify::structure::CssEntity, nom::error::Error<&str>, for<'a> fn(&'a str) -> std::result::Result<(&'a str, css_minify::structure::CssEntity), nom::internal::Err<nom::error::Error<&'a str>>> {css_minify::parsers::css_entity::parse_entity}>::{closure#0}], for<'a> fn(&'a str) -> std::result::Result<(&'a str, std::vec::Vec<&'a str>), nom::internal::Err<nom::error::Error<&'a str>>> {css_minify::parsers::utils::parse_useless})>::{closure#0}] as nom::internal::Parser<&str, (std::vec::Vec<&str>, std::vec::Vec<css_minify::structure::CssEntity>, std::vec::Vec<&str>), nom::error::Error<&str>>>::parse` at cargo/registry/src/index.crates.io-6f17d22bba15001f/nom-7.1.3/src/internal.rs:325:5: 325:12
   = note: inside closure at cargo/registry/src/index.crates.io-6f17d22bba15001f/nom-7.1.3/src/combinator/mod.rs:79:23: 79:42
   = note: inside `<[closure@nom::combinator::map<&str, (std::vec::Vec<&str>, std::vec::Vec<css_minify::structure::CssEntity>, std::vec::Vec<&str>), std::vec::Vec<css_minify::structure::CssEntity>, nom::error::Error<&str>, [closure@nom::sequence::tuple<&str, (std::vec::Vec<&str>, std::vec::Vec<css_minify::structure::CssEntity>, std::vec::Vec<&str>), nom::error::Error<&str>, (for<'a> fn(&'a str) -> std::result::Result<(&'a str, std::vec::Vec<&'a str>), nom::internal::Err<nom::error::Error<&'a str>>> {css_minify::parsers::utils::parse_useless}, [closure@nom::multi::many0<&str, css_minify::structure::CssEntity, nom::error::Error<&str>, for<'a> fn(&'a str) -> std::result::Result<(&'a str, css_minify::structure::CssEntity), nom::internal::Err<nom::error::Error<&'a str>>> {css_minify::parsers::css_entity::parse_entity}>::{closure#0}], for<'a> fn(&'a str) -> std::result::Result<(&'a str, std::vec::Vec<&'a str>), nom::internal::Err<nom::error::Error<&'a str>>> {css_minify::parsers::utils::parse_useless})>::{closure#0}], [closure@css_minify::parsers::utils::non_useless<'_, std::vec::Vec<css_minify::structure::CssEntity>, [closure@nom::multi::many0<&str, css_minify::structure::CssEntity, nom::error::Error<&str>, for<'a> fn(&'a str) -> std::result::Result<(&'a str, css_minify::structure::CssEntity), nom::internal::Err<nom::error::Error<&'a str>>> {css_minify::parsers::css_entity::parse_entity}>::{closure#0}]>::{closure#0}]>::{closure#0}] as nom::internal::Parser<&str, std::vec::Vec<css_minify::structure::CssEntity>, nom::error::Error<&str>>>::parse` at cargo/registry/src/index.crates.io-6f17d22bba15001f/nom-7.1.3/src/internal.rs:325:5: 325:12
   = note: inside closure at cargo/registry/src/index.crates.io-6f17d22bba15001f/nom-7.1.3/src/combinator/mod.rs:79:23: 79:42
   = note: inside `css_minify::parsers::css_entity::parse_entities` at cargo/registry/src/index.crates.io-6f17d22bba15001f/css-minify-0.3.1/src/parsers/css_entity.rs:18:5: 18:67
   = note: inside `<for<'a> fn(&'a str) -> std::result::Result<(&'a str, css_minify::structure::CssEntities), nom::internal::Err<nom::error::Error<&'a str>>> {css_minify::parsers::css_entity::parse_entities} as std::ops::FnMut<(&str,)>>::call_mut - shim(for<'a> fn(&'a str) -> std::result::Result<(&'a str, css_minify::structure::CssEntities), nom::internal::Err<nom::error::Error<&'a str>>> {css_minify::parsers::css_entity::parse_entities})` at /home/nemo157/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:166:5: 166:75
   = note: inside `<for<'a> fn(&'a str) -> std::result::Result<(&'a str, css_minify::structure::CssEntities), nom::internal::Err<nom::error::Error<&'a str>>> {css_minify::parsers::css_entity::parse_entities} as nom::internal::Parser<&str, css_minify::structure::CssEntities, nom::error::Error<&str>>>::parse` at cargo/registry/src/index.crates.io-6f17d22bba15001f/nom-7.1.3/src/internal.rs:325:5: 325:12
   = note: inside closure at cargo/registry/src/index.crates.io-6f17d22bba15001f/nom-7.1.3/src/combinator/mod.rs:390:24: 390:38
   = note: inside `css_minify::parsers::css_entity::parse_css` at cargo/registry/src/index.crates.io-6f17d22bba15001f/css-minify-0.3.1/src/parsers/css_entity.rs:14:5: 14:41
   = note: inside `css_minify::optimizations::Minifier::minify` at cargo/registry/src/index.crates.io-6f17d22bba15001f/css-minify-0.3.1/src/optimizations/mod.rs:39:26: 39:42
   = note: inside `minify_html::minify::css::minify_css` at cargo/registry/src/index.crates.io-6f17d22bba15001f/minify-html-0.11.1/src/minify/css.rs:9:22: 21:10
   = note: inside `minify_html::minify::content::minify_content` at cargo/registry/src/index.crates.io-6f17d22bba15001f/minify-html-0.11.1/src/minify/content.rs:149:43: 149:70
   = note: inside `minify_html::minify::element::minify_element` at cargo/registry/src/index.crates.io-6f17d22bba15001f/minify-html-0.11.1/src/minify/element.rs:105:5: 116:6
   = note: inside `minify_html::minify::content::minify_content` at cargo/registry/src/index.crates.io-6f17d22bba15001f/minify-html-0.11.1/src/minify/content.rs:133:18: 145:14
   = note: inside `minify_html::minify::element::minify_element` at cargo/registry/src/index.crates.io-6f17d22bba15001f/minify-html-0.11.1/src/minify/element.rs:105:5: 116:6
   = note: inside `minify_html::minify::content::minify_content` at cargo/registry/src/index.crates.io-6f17d22bba15001f/minify-html-0.11.1/src/minify/content.rs:133:18: 145:14
   = note: inside `minify_html::minify` at cargo/registry/src/index.crates.io-6f17d22bba15001f/minify-html-0.11.1/src/lib.rs:43:5: 50:6
note: inside `main`
  --> src/main.rs:7:18
   |
7  |     let output = minify_html::minify(input, &cfg);
   |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The issue is at this line:

unsafe { from_utf8_unchecked(code) },

When parsing the style element it is not checked that it contains utf-8, simply finding where the corresponding </style> is, so this is creating an &str to non-utf-8 content.

And similarly for a style attribute with non-utf-8 content b"<body style=\"\xff\" />" hitting

value_raw_wrapped.push_str(unsafe { from_utf8_unchecked(&value_raw) });

Thanks for raising. This was a carryover from the onepass variant as a performance optimisation; the input must be UTF-8 (as mentioned in the entrypoint functions) but this isn't checked/enforced. As performance isn't as important in minify-html, I've changed these to the safe variants with an .expect, and added #[deny(unsafe_code)].