servo/html5ever

Jumping on uninitalized value

Kixiron opened this issue · 13 comments

While running valgrind on my program, I found this

==28112== Conditional jump or move depends on uninitialised value(s)
==28112==    at 0xB28252: html5ever::tree_builder::data::doctype_error_and_quirks (data.rs:157)
==28112==    by 0x682617: <html5ever::tree_builder::TreeBuilder<Handle,Sink> as html5ever::tokenizer::interface::TokenSink>::process_token (mod.rs:472)
==28112==    by 0x6D0AA8: _ZN9html5ever9tokenizer21Tokenizer$LT$Sink$GT$13process_token17ha27fb7b70a53cc47E.llvm.10445680363282595522 (mod.rs:237)
==28112==    by 0x6D926A: html5ever::tokenizer::Tokenizer<Sink>::step (mod.rs:0)
==28112==    by 0x6D390A: _ZN9html5ever9tokenizer21Tokenizer$LT$Sink$GT$3run17h29926122ee9878e2E.llvm.10445680363282595522 (mod.rs:369)
==28112==    by 0x62196F: feed<html5ever::tree_builder::TreeBuilder<kuchiki::tree::NodeRef, kuchiki::parser::Sink>> (mod.rs:224)
==28112==    by 0x62196F: <html5ever::driver::Parser<Sink> as tendril::stream::TendrilSink<tendril::fmt::UTF8>>::process (driver.rs:109)
==28112==    by 0x5E9504: process<html5ever::driver::Parser<kuchiki::parser::Sink>,tendril::tendril::NonAtomic> (stream.rs:180)
==28112==    by 0x5E9504: one<tendril::stream::Utf8LossyDecoder<html5ever::driver::Parser<kuchiki::parser::Sink>, tendril::tendril::NonAtomic>,tendril::fmt::Bytes,tendril::tendril::NonAtomic,&[u8]> (stream.rs:48)
==28112==    by 0x5E9504: cratesfyi::utils::html::extract_head_and_body (html.rs:28)
==28112==    by 0x76F641: cratesfyi::web::rustdoc::rustdoc_html_server_handler (rustdoc.rs:312)
==28112==    by 0x751A1B: call<fn(&mut iron::request::Request) -> core::result::Result<iron::response::Response, iron::error::IronError>,(&mut iron::request::Request)> (function.rs:72)
==28112==    by 0x751A1B: <F as iron::middleware::Handler>::handle (mod.rs:312)
==28112==    by 0xA3ED50: <alloc::boxed::Box<dyn iron::middleware::Handler> as iron::middleware::Handler>::handle (mod.rs:318)
==28112==    by 0x7009C6: <cratesfyi::web::metrics::RequestRecorder as iron::middleware::Handler>::handle (metrics.rs:185)
==28112==    by 0xA3ED50: <alloc::boxed::Box<dyn iron::middleware::Handler> as iron::middleware::Handler>::handle (mod.rs:318)
==28112==  Uninitialised value was created by a stack allocation
==28112==    at 0xB27BC0: html5ever::tree_builder::data::doctype_error_and_quirks (data.rs:91)

gdb then pointed out the source of the problem, which is this snippet

html5ever::tree_builder::data::doctype_error_and_quirks (doctype=<optimized out>, iframe_srcdoc=false) at ~/.cargo/registry/src/github.com-1ecc6299db9ec823/html5ever-0.25.1/src/tree_builder/data.rs:157
157             (_, Some(ref s)) if QUIRKY_SYSTEM_MATCHES.contains(s) => Quirks,
jdm commented

@Kixiron Do you have a public codebase that reproduces the error that I can refer to?

jdm commented

I've tried running https://github.com/jyn514/html5ever-example/blob/master/src/struct.CaptureMatches.html through a kuchiki example program under valgrind, and I don't see the error you reported. A minimal program that reproduces the error would be most useful for me.

The code's all on this branch, I made a minimal reproduction with this being all the parse-related code. For reproduction steps I simply ran cargo build --release --bin html5ever and then valgrind target/release/html5ever

Me running that binary gives this output

==9592== Conditional jump or move depends on uninitialised value(s)
==9592==    at 0x1B4CF2: html5ever::tree_builder::data::doctype_error_and_quirks (data.rs:157)
==9592==    by 0x168607: <html5ever::tree_builder::TreeBuilder<Handle,Sink> as html5ever::tokenizer::interface::TokenSink>::process_token (mod.rs:472)
==9592==    by 0x187348: _ZN9html5ever9tokenizer21Tokenizer$LT$Sink$GT$13process_token17ha27fb7b70a53cc47E.llvm.10445680363282595522 (mod.rs:237)
==9592==    by 0x18FB0A: html5ever::tokenizer::Tokenizer<Sink>::step (mod.rs:0)
==9592==    by 0x18A1AA: _ZN9html5ever9tokenizer21Tokenizer$LT$Sink$GT$3run17h29926122ee9878e2E.llvm.10445680363282595522 (mod.rs:369)
==9592==    by 0x15D78F: feed<html5ever::tree_builder::TreeBuilder<kuchiki::tree::NodeRef, kuchiki::parser::Sink>> (mod.rs:224)
==9592==    by 0x15D78F: <html5ever::driver::Parser<Sink> as tendril::stream::TendrilSink<tendril::fmt::UTF8>>::process (driver.rs:109)
==9592==    by 0x158024: process<html5ever::driver::Parser<kuchiki::parser::Sink>,tendril::tendril::NonAtomic> (stream.rs:180)
==9592==    by 0x158024: one<tendril::stream::Utf8LossyDecoder<html5ever::driver::Parser<kuchiki::parser::Sink>, tendril::tendril::NonAtomic>,tendril::fmt::Bytes,tendril::tendril::NonAtomic,&[u8]> (stream.rs:48)
==9592==    by 0x158024: cratesfyi::utils::html::extract_head_and_body (html.rs:28)
==9592==    by 0x1554F1: html5ever::main (html5ever.rs:4)
==9592==    by 0x154242: std::rt::lang_start::{{closure}} (rt.rs:67)
==9592==    by 0x1E42C7: {{closure}} (rt.rs:52)
==9592==    by 0x1E42C7: do_call<closure-0,i32> (panicking.rs:331)
==9592==    by 0x1E42C7: try<i32,closure-0> (panicking.rs:274)
==9592==    by 0x1E42C7: catch_unwind<closure-0,i32> (panic.rs:394)
==9592==    by 0x1E42C7: std::rt::lang_start_internal (rt.rs:51)
==9592==    by 0x155687: main (in /mnt/g/Users/Chase/Code/Rust/docs.rs/target/release/html5ever)
jdm commented

Oh, interesting, I can reproduce the same problem in the kuchiki example when running with --release, but not with a debug build.

Same on my end, debug builds have no issues

These sorts of errors are typically because LLVM has made valid optimizations but valgrind doesn’t/can’t understand that, so valgrind flags a potential error.

jdm commented

Also interesting - when I add a println above this line for opt_string_as_slice(&system) the valgrind error disappears. My suspicion that this is a compiler bug grows.

jdm commented

Rewriting the match as a series of equivalent if let and if statements does not change the valgrind output.

jdm commented

The following diff makes the error disappear:

diff --git a/html5ever/src/tree_builder/data.rs b/html5ever/src/tree_builder/data.rs
index 9d51a71..5e84200 100644
--- a/html5ever/src/tree_builder/data.rs
+++ b/html5ever/src/tree_builder/data.rs
@@ -147,14 +147,14 @@ pub fn doctype_error_and_quirks(doctype: &Doctype, iframe_srcdoc: bool) -> (bool
     let public = opt_to_ascii_lower(public);
     let system = opt_to_ascii_lower(system);
 
-    let quirk = match (opt_string_as_slice(&public), opt_string_as_slice(&system)) {
+    let quirk = match (opt_string_as_slice(&public), &system) {
         _ if doctype.force_quirks => Quirks,
         _ if name != Some("html") => Quirks,
 
         _ if iframe_srcdoc => NoQuirks,
 
         (Some(ref p), _) if QUIRKY_PUBLIC_MATCHES.contains(p) => Quirks,
-        (_, Some(ref s)) if QUIRKY_SYSTEM_MATCHES.contains(s) => Quirks,
+        (_, Some(ref s)) if QUIRKY_SYSTEM_MATCHES.contains(&&**s) => Quirks,
 
         (Some(p), _) if contains_pfx(QUIRKY_PUBLIC_PREFIXES, p) => Quirks,
         (Some(p), _) if contains_pfx(LIMITED_QUIRKY_PUBLIC_PREFIXES, p) => LimitedQuirks,
jdm commented

This is a minimal reproduction of the same problem. I'll try filing it on rustc and see if anything comes of it:

fn main() {
    let mut args = std::env::args();
    let s1 = args.next();
    let s2 = args.next();
    let a_matches = &["a"];
    let b_matches = &["b"];
    let a = s1.as_ref().map(|s| &s[..]);
    let b = s2.as_ref().map(|s| &s[..]);
    match (a, b) {
        (Some(ref a), _) if a_matches.contains(a) => println!("a"),
        (_, Some(ref b)) if b_matches.contains(b) => println!("b"),
        _ => println!("nothing"),
    }
}
jdm commented

Given rust-lang/rust#73344 (comment), I suspect the answer here is to update valgrind and leave the html5ever code unchanged.