Valgrind reports "possibly lost" when using static `Regex`
wyfo opened this issue · 7 comments
What version of regex are you using?
regex = "1.10.5"
Describe the bug at a high level.
Valgrind reports "possibly lost" when using static Regex.
What are the steps to reproduce the behavior?
use regex::Regex;
static mut REGEX: Option<Regex> = None;
fn main() {
unsafe {
REGEX = Regex::new(r"").ok();
REGEX.as_ref().unwrap().captures("");
}
}What is the actual behavior?
Here is valgrind command and report:
valgrind --leak-check=full --num-callers=50 target/debug/regex-leak
==17154== Memcheck, a memory error detector
==17154== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==17154== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==17154== Command: target/debug/regex-leak
==17154==
==17154==
==17154== HEAP SUMMARY:
==17154== in use at exit: 7,266 bytes in 51 blocks
==17154== total heap usage: 122 allocs, 71 frees, 13,048 bytes allocated
==17154==
==17154== 108 bytes in 1 blocks are possibly lost in loss record 41 of 51
==17154== at 0x4885250: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==17154== by 0x1B4B67: alloc (alloc.rs:98)
==17154== by 0x1B4B67: alloc::alloc::Global::alloc_impl (alloc.rs:181)
==17154== by 0x1B557B: <alloc::alloc::Global as core::alloc::Allocator>::allocate (alloc.rs:241)
==17154== by 0x1AE187: hashbrown::raw::alloc::inner::do_alloc (alloc.rs:15)
==17154== by 0x1B6D77: hashbrown::raw::RawTableInner::new_uninitialized (mod.rs:1754)
==17154== by 0x1B70FB: hashbrown::raw::RawTableInner::fallible_with_capacity (mod.rs:1792)
==17154== by 0x1B5F13: hashbrown::raw::RawTableInner::prepare_resize (mod.rs:2871)
==17154== by 0x1B9167: resize_inner<alloc::alloc::Global> (mod.rs:3067)
==17154== by 0x1B9167: reserve_rehash_inner<alloc::alloc::Global> (mod.rs:2957)
==17154== by 0x1B9167: hashbrown::raw::RawTable<T,A>::reserve_rehash (mod.rs:1235)
==17154== by 0x1BA85F: hashbrown::raw::RawTable<T,A>::reserve (mod.rs:1183)
==17154== by 0x1B9A6B: hashbrown::raw::RawTable<T,A>::find_or_find_insert_slot (mod.rs:1417)
==17154== by 0x189B6F: hashbrown::map::HashMap<K,V,S,A>::insert (map.rs:1754)
==17154== by 0x17BEAB: std::collections::hash::map::HashMap<K,V,S>::insert (map.rs:1105)
==17154== by 0x1CA35B: regex_automata::hybrid::dfa::Lazy::add_state (dfa.rs:2309)
==17154== by 0x1CC05B: regex_automata::hybrid::dfa::Lazy::init_cache (dfa.rs:2534)
==17154== by 0x1C85EF: regex_automata::hybrid::dfa::Cache::new (dfa.rs:1891)
==17154== by 0x22914F: regex_automata::hybrid::regex::Cache::new (regex.rs:613)
==17154== by 0x228B2B: regex_automata::hybrid::regex::Regex::create_cache (regex.rs:192)
==17154== by 0x1A5DCF: regex_automata::meta::wrappers::HybridCache::new::{{closure}} (wrappers.rs:788)
==17154== by 0x1BDB03: core::option::Option<T>::map (option.rs:1072)
==17154== by 0x1A5D9B: regex_automata::meta::wrappers::HybridCache::new (wrappers.rs:788)
==17154== by 0x1A5183: regex_automata::meta::wrappers::Hybrid::create_cache (wrappers.rs:541)
==17154== by 0x192E43: <regex_automata::meta::strategy::Core as regex_automata::meta::strategy::Strategy>::create_cache (strategy.rs:679)
==17154== by 0x18B70F: regex_automata::meta::regex::Builder::build_many_from_hir::{{closure}} (regex.rs:3556)
==17154== by 0x14D73B: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call (boxed.rs:2021)
==17154== by 0x149223: regex_automata::util::pool::inner::Pool<T,F>::get_slow (pool.rs:568)
==17154== by 0x1490DF: regex_automata::util::pool::inner::Pool<T,F>::get (pool.rs:533)
==17154== by 0x14C01F: regex_automata::util::pool::Pool<T,F>::get (pool.rs:182)
==17154== by 0x149D6B: regex_automata::meta::regex::Regex::search_slots (regex.rs:1134)
==17154== by 0x149EC3: regex_automata::meta::regex::Regex::search_captures (regex.rs:1065)
==17154== by 0x14B453: regex::regex::string::Regex::captures_at (string.rs:1151)
==17154== by 0x14B5B3: regex::regex::string::Regex::captures (string.rs:356)
==17154== by 0x14ACC3: regex_leak::main (main.rs:8)
==17154== by 0x14B73B: core::ops::function::FnOnce::call_once (function.rs:250)
==17154== by 0x14D7F7: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:154)
==17154== by 0x14AFEB: std::rt::lang_start::{{closure}} (rt.rs:167)
==17154== by 0x344577: call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (function.rs:284)
==17154== by 0x344577: do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panicking.rs:552)
==17154== by 0x344577: try<i32, &(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (panicking.rs:516)
==17154== by 0x344577: catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panic.rs:142)
==17154== by 0x344577: {closure#2} (rt.rs:148)
==17154== by 0x344577: do_call<std::rt::lang_start_internal::{closure_env#2}, isize> (panicking.rs:552)
==17154== by 0x344577: try<isize, std::rt::lang_start_internal::{closure_env#2}> (panicking.rs:516)
==17154== by 0x344577: catch_unwind<std::rt::lang_start_internal::{closure_env#2}, isize> (panic.rs:142)
==17154== by 0x344577: std::rt::lang_start_internal (rt.rs:148)
==17154== by 0x14AFBB: std::rt::lang_start (rt.rs:166)
==17154== by 0x14AD03: main (in /app/target/debug/regex-leak)
==17154==
==17154== 108 bytes in 1 blocks are possibly lost in loss record 42 of 51
==17154== at 0x4885250: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==17154== by 0x1B4B67: alloc (alloc.rs:98)
==17154== by 0x1B4B67: alloc::alloc::Global::alloc_impl (alloc.rs:181)
==17154== by 0x1B557B: <alloc::alloc::Global as core::alloc::Allocator>::allocate (alloc.rs:241)
==17154== by 0x1AE187: hashbrown::raw::alloc::inner::do_alloc (alloc.rs:15)
==17154== by 0x1B6D77: hashbrown::raw::RawTableInner::new_uninitialized (mod.rs:1754)
==17154== by 0x1B70FB: hashbrown::raw::RawTableInner::fallible_with_capacity (mod.rs:1792)
==17154== by 0x1B5F13: hashbrown::raw::RawTableInner::prepare_resize (mod.rs:2871)
==17154== by 0x1B9167: resize_inner<alloc::alloc::Global> (mod.rs:3067)
==17154== by 0x1B9167: reserve_rehash_inner<alloc::alloc::Global> (mod.rs:2957)
==17154== by 0x1B9167: hashbrown::raw::RawTable<T,A>::reserve_rehash (mod.rs:1235)
==17154== by 0x1BA85F: hashbrown::raw::RawTable<T,A>::reserve (mod.rs:1183)
==17154== by 0x1B9A6B: hashbrown::raw::RawTable<T,A>::find_or_find_insert_slot (mod.rs:1417)
==17154== by 0x189B6F: hashbrown::map::HashMap<K,V,S,A>::insert (map.rs:1754)
==17154== by 0x17BEAB: std::collections::hash::map::HashMap<K,V,S>::insert (map.rs:1105)
==17154== by 0x1CA35B: regex_automata::hybrid::dfa::Lazy::add_state (dfa.rs:2309)
==17154== by 0x1CC05B: regex_automata::hybrid::dfa::Lazy::init_cache (dfa.rs:2534)
==17154== by 0x1C85EF: regex_automata::hybrid::dfa::Cache::new (dfa.rs:1891)
==17154== by 0x229187: regex_automata::hybrid::regex::Cache::new (regex.rs:614)
==17154== by 0x228B2B: regex_automata::hybrid::regex::Regex::create_cache (regex.rs:192)
==17154== by 0x1A5DCF: regex_automata::meta::wrappers::HybridCache::new::{{closure}} (wrappers.rs:788)
==17154== by 0x1BDB03: core::option::Option<T>::map (option.rs:1072)
==17154== by 0x1A5D9B: regex_automata::meta::wrappers::HybridCache::new (wrappers.rs:788)
==17154== by 0x1A5183: regex_automata::meta::wrappers::Hybrid::create_cache (wrappers.rs:541)
==17154== by 0x192E43: <regex_automata::meta::strategy::Core as regex_automata::meta::strategy::Strategy>::create_cache (strategy.rs:679)
==17154== by 0x18B70F: regex_automata::meta::regex::Builder::build_many_from_hir::{{closure}} (regex.rs:3556)
==17154== by 0x14D73B: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call (boxed.rs:2021)
==17154== by 0x149223: regex_automata::util::pool::inner::Pool<T,F>::get_slow (pool.rs:568)
==17154== by 0x1490DF: regex_automata::util::pool::inner::Pool<T,F>::get (pool.rs:533)
==17154== by 0x14C01F: regex_automata::util::pool::Pool<T,F>::get (pool.rs:182)
==17154== by 0x149D6B: regex_automata::meta::regex::Regex::search_slots (regex.rs:1134)
==17154== by 0x149EC3: regex_automata::meta::regex::Regex::search_captures (regex.rs:1065)
==17154== by 0x14B453: regex::regex::string::Regex::captures_at (string.rs:1151)
==17154== by 0x14B5B3: regex::regex::string::Regex::captures (string.rs:356)
==17154== by 0x14ACC3: regex_leak::main (main.rs:8)
==17154== by 0x14B73B: core::ops::function::FnOnce::call_once (function.rs:250)
==17154== by 0x14D7F7: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:154)
==17154== by 0x14AFEB: std::rt::lang_start::{{closure}} (rt.rs:167)
==17154== by 0x344577: call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (function.rs:284)
==17154== by 0x344577: do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panicking.rs:552)
==17154== by 0x344577: try<i32, &(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (panicking.rs:516)
==17154== by 0x344577: catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panic.rs:142)
==17154== by 0x344577: {closure#2} (rt.rs:148)
==17154== by 0x344577: do_call<std::rt::lang_start_internal::{closure_env#2}, isize> (panicking.rs:552)
==17154== by 0x344577: try<isize, std::rt::lang_start_internal::{closure_env#2}> (panicking.rs:516)
==17154== by 0x344577: catch_unwind<std::rt::lang_start_internal::{closure_env#2}, isize> (panic.rs:142)
==17154== by 0x344577: std::rt::lang_start_internal (rt.rs:148)
==17154== by 0x14AFBB: std::rt::lang_start (rt.rs:166)
==17154== by 0x14AD03: main (in /app/target/debug/regex-leak)
==17154==
==17154== LEAK SUMMARY:
==17154== definitely lost: 0 bytes in 0 blocks
==17154== indirectly lost: 0 bytes in 0 blocks
==17154== possibly lost: 216 bytes in 2 blocks
==17154== still reachable: 7,050 bytes in 49 blocks
==17154== suppressed: 0 bytes in 0 blocks
==17154== Reachable blocks (those to which a pointer was found) are not shown.
==17154== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==17154==
==17154== For lists of detected and suppressed errors, rerun with: -s
==17154== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
What is the expected behavior?
I expect to have still reachable blocks, but I don't know the reason of possibly lost.
Yeah, so? You're sticking the Regex in a global mutable variable. Its destructor will never run.
Sorry, I missclicked and submitted without the full text. I've edited the description.
I know the destructor is never run, but usually (at least in my short experience), it results in still reachable and not in possibly lost.
Quoting the Valgrind documentation:
"possibly lost" means your program is leaking memory, unless you're doing unusual things with pointers that could cause them to point into the middle of an allocated block; see the user manual for some possible causes. Use --show-possibly-lost=no if you don't want to see these reports.
"still reachable" means your program is probably ok -- it didn't free some memory it could have. This is quite common and often reasonable. Don't use --show-reachable=yes if you don't want to see these reports.
possibly lost may thus indicate a possible bug, that's why valgrind default behavior is to treat them as an error.
My question is then: does these possibly lost blocks fall in the category of "unusual things with pointers that could cause them to point into the middle of an allocated block", or may it be a non trivial memory leak?
To give some context, I'm facing this possible leak using tracing_subscriber::EnvFilter, which uses Lazy<Regex> internally https://github.com/tokio-rs/tracing/blob/master/tracing-subscriber/src/filter/env/directive.rs#L123.
About the still reachable/possibly lost distinction, tracing is using a static subscriber, and that results in a still reachable, see tokio-rs/tracing#2069, and that's totally fine. However, when using EnvFilter the leak changes of category, "because" of this Lazy<Regex>.
Another maybe useful information, the possibly lost blocks number and size changes with the regex.
For example:
Regex::new(r"")->possibly lost: 216 bytes in 2 blocksRegex::new(r"\w")->possibly lost: 416 bytes in 2 blocksRegex::new(r"(?P<name>\w)")->possibly lost: 524 bytes in 3 blocks
Ok, I've understood that it felt in the case "doing unusual things with pointers", sorry for the bother. Seem's I've to add a valgrind suppression.
My prior is that valgrind reports false positives, and that its behavior at least partially depends on the allocator being used. So I need more evidence.
Otherwise, I don't really see anything wrong here. Like yes... you have a leak because you aren't running a regex's destructor. And yes, it changes with different patterns because different patterns require different amounts of heap... I'm not sure why you would expect anything different.
Also, regex internally doesn't really do anything that would cause leaks in the first place. Rust is itself doing a lot of leak checking already.