Jumping on uninitalized value
Kixiron opened this issue · 13 comments
While running valgrind on my program, I found this
==28112== Conditional jump or move depends on uninitialised value(s)
==28112== at 0xB28252: html5ever::tree_builder::data::doctype_error_and_quirks (data.rs:157)
==28112== by 0x682617: <html5ever::tree_builder::TreeBuilder<Handle,Sink> as html5ever::tokenizer::interface::TokenSink>::process_token (mod.rs:472)
==28112== by 0x6D0AA8: _ZN9html5ever9tokenizer21Tokenizer$LT$Sink$GT$13process_token17ha27fb7b70a53cc47E.llvm.10445680363282595522 (mod.rs:237)
==28112== by 0x6D926A: html5ever::tokenizer::Tokenizer<Sink>::step (mod.rs:0)
==28112== by 0x6D390A: _ZN9html5ever9tokenizer21Tokenizer$LT$Sink$GT$3run17h29926122ee9878e2E.llvm.10445680363282595522 (mod.rs:369)
==28112== by 0x62196F: feed<html5ever::tree_builder::TreeBuilder<kuchiki::tree::NodeRef, kuchiki::parser::Sink>> (mod.rs:224)
==28112== by 0x62196F: <html5ever::driver::Parser<Sink> as tendril::stream::TendrilSink<tendril::fmt::UTF8>>::process (driver.rs:109)
==28112== by 0x5E9504: process<html5ever::driver::Parser<kuchiki::parser::Sink>,tendril::tendril::NonAtomic> (stream.rs:180)
==28112== by 0x5E9504: one<tendril::stream::Utf8LossyDecoder<html5ever::driver::Parser<kuchiki::parser::Sink>, tendril::tendril::NonAtomic>,tendril::fmt::Bytes,tendril::tendril::NonAtomic,&[u8]> (stream.rs:48)
==28112== by 0x5E9504: cratesfyi::utils::html::extract_head_and_body (html.rs:28)
==28112== by 0x76F641: cratesfyi::web::rustdoc::rustdoc_html_server_handler (rustdoc.rs:312)
==28112== by 0x751A1B: call<fn(&mut iron::request::Request) -> core::result::Result<iron::response::Response, iron::error::IronError>,(&mut iron::request::Request)> (function.rs:72)
==28112== by 0x751A1B: <F as iron::middleware::Handler>::handle (mod.rs:312)
==28112== by 0xA3ED50: <alloc::boxed::Box<dyn iron::middleware::Handler> as iron::middleware::Handler>::handle (mod.rs:318)
==28112== by 0x7009C6: <cratesfyi::web::metrics::RequestRecorder as iron::middleware::Handler>::handle (metrics.rs:185)
==28112== by 0xA3ED50: <alloc::boxed::Box<dyn iron::middleware::Handler> as iron::middleware::Handler>::handle (mod.rs:318)
==28112== Uninitialised value was created by a stack allocation
==28112== at 0xB27BC0: html5ever::tree_builder::data::doctype_error_and_quirks (data.rs:91)
gdb then pointed out the source of the problem, which is this snippet
html5ever::tree_builder::data::doctype_error_and_quirks (doctype=<optimized out>, iframe_srcdoc=false) at ~/.cargo/registry/src/github.com-1ecc6299db9ec823/html5ever-0.25.1/src/tree_builder/data.rs:157
157 (_, Some(ref s)) if QUIRKY_SYSTEM_MATCHES.contains(s) => Quirks,
I've tried running https://github.com/jyn514/html5ever-example/blob/master/src/struct.CaptureMatches.html through a kuchiki example program under valgrind, and I don't see the error you reported. A minimal program that reproduces the error would be most useful for me.
The code's all on this branch, I made a minimal reproduction with this being all the parse-related code. For reproduction steps I simply ran cargo build --release --bin html5ever
and then valgrind target/release/html5ever
Me running that binary gives this output
==9592== Conditional jump or move depends on uninitialised value(s)
==9592== at 0x1B4CF2: html5ever::tree_builder::data::doctype_error_and_quirks (data.rs:157)
==9592== by 0x168607: <html5ever::tree_builder::TreeBuilder<Handle,Sink> as html5ever::tokenizer::interface::TokenSink>::process_token (mod.rs:472)
==9592== by 0x187348: _ZN9html5ever9tokenizer21Tokenizer$LT$Sink$GT$13process_token17ha27fb7b70a53cc47E.llvm.10445680363282595522 (mod.rs:237)
==9592== by 0x18FB0A: html5ever::tokenizer::Tokenizer<Sink>::step (mod.rs:0)
==9592== by 0x18A1AA: _ZN9html5ever9tokenizer21Tokenizer$LT$Sink$GT$3run17h29926122ee9878e2E.llvm.10445680363282595522 (mod.rs:369)
==9592== by 0x15D78F: feed<html5ever::tree_builder::TreeBuilder<kuchiki::tree::NodeRef, kuchiki::parser::Sink>> (mod.rs:224)
==9592== by 0x15D78F: <html5ever::driver::Parser<Sink> as tendril::stream::TendrilSink<tendril::fmt::UTF8>>::process (driver.rs:109)
==9592== by 0x158024: process<html5ever::driver::Parser<kuchiki::parser::Sink>,tendril::tendril::NonAtomic> (stream.rs:180)
==9592== by 0x158024: one<tendril::stream::Utf8LossyDecoder<html5ever::driver::Parser<kuchiki::parser::Sink>, tendril::tendril::NonAtomic>,tendril::fmt::Bytes,tendril::tendril::NonAtomic,&[u8]> (stream.rs:48)
==9592== by 0x158024: cratesfyi::utils::html::extract_head_and_body (html.rs:28)
==9592== by 0x1554F1: html5ever::main (html5ever.rs:4)
==9592== by 0x154242: std::rt::lang_start::{{closure}} (rt.rs:67)
==9592== by 0x1E42C7: {{closure}} (rt.rs:52)
==9592== by 0x1E42C7: do_call<closure-0,i32> (panicking.rs:331)
==9592== by 0x1E42C7: try<i32,closure-0> (panicking.rs:274)
==9592== by 0x1E42C7: catch_unwind<closure-0,i32> (panic.rs:394)
==9592== by 0x1E42C7: std::rt::lang_start_internal (rt.rs:51)
==9592== by 0x155687: main (in /mnt/g/Users/Chase/Code/Rust/docs.rs/target/release/html5ever)
Oh, interesting, I can reproduce the same problem in the kuchiki example when running with --release, but not with a debug build.
Same on my end, debug builds have no issues
These sorts of errors are typically because LLVM has made valid optimizations but valgrind doesn’t/can’t understand that, so valgrind flags a potential error.
Also interesting - when I add a println above this line for opt_string_as_slice(&system)
the valgrind error disappears. My suspicion that this is a compiler bug grows.
Rewriting the match as a series of equivalent if let and if statements does not change the valgrind output.
The following diff makes the error disappear:
diff --git a/html5ever/src/tree_builder/data.rs b/html5ever/src/tree_builder/data.rs
index 9d51a71..5e84200 100644
--- a/html5ever/src/tree_builder/data.rs
+++ b/html5ever/src/tree_builder/data.rs
@@ -147,14 +147,14 @@ pub fn doctype_error_and_quirks(doctype: &Doctype, iframe_srcdoc: bool) -> (bool
let public = opt_to_ascii_lower(public);
let system = opt_to_ascii_lower(system);
- let quirk = match (opt_string_as_slice(&public), opt_string_as_slice(&system)) {
+ let quirk = match (opt_string_as_slice(&public), &system) {
_ if doctype.force_quirks => Quirks,
_ if name != Some("html") => Quirks,
_ if iframe_srcdoc => NoQuirks,
(Some(ref p), _) if QUIRKY_PUBLIC_MATCHES.contains(p) => Quirks,
- (_, Some(ref s)) if QUIRKY_SYSTEM_MATCHES.contains(s) => Quirks,
+ (_, Some(ref s)) if QUIRKY_SYSTEM_MATCHES.contains(&&**s) => Quirks,
(Some(p), _) if contains_pfx(QUIRKY_PUBLIC_PREFIXES, p) => Quirks,
(Some(p), _) if contains_pfx(LIMITED_QUIRKY_PUBLIC_PREFIXES, p) => LimitedQuirks,
This is a minimal reproduction of the same problem. I'll try filing it on rustc and see if anything comes of it:
fn main() {
let mut args = std::env::args();
let s1 = args.next();
let s2 = args.next();
let a_matches = &["a"];
let b_matches = &["b"];
let a = s1.as_ref().map(|s| &s[..]);
let b = s2.as_ref().map(|s| &s[..]);
match (a, b) {
(Some(ref a), _) if a_matches.contains(a) => println!("a"),
(_, Some(ref b)) if b_matches.contains(b) => println!("b"),
_ => println!("nothing"),
}
}
Filed rust-lang/rust#73344.
Given rust-lang/rust#73344 (comment), I suspect the answer here is to update valgrind and leave the html5ever code unchanged.