memory leak in regex
Opened this issue · 8 comments
Bugzilla Link | 51659 |
Version | 11.0 |
OS | Linux |
Reporter | LLVM Bugzilla Contributor |
CC | @mclow |
Extended Description
The following program leaks memory (using clang 11 on Debian Bullseye, Debian clang version 11.0.1-2):
paul@machine:~/code/stdfuzz/build$ cat problem.cpp
#include
int
main()
{
std::regex{ R"(()*)",
std::regex_constants::icase | std::regex_constants::nosubs |
std::regex::optimize | std::regex::collate | std::regex::grep };
}
paul@machine:/code/stdfuzz/build$ clang++-11 --stdlib=libc++ problem.cpp -fsanitize=leak -g/code/stdfuzz/build$ ./a.out
paul@simdjson:
=================================================================
==18364==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 16 byte(s) in 1 object(s) allocated from:
#0 0x4172e8 in operator new(unsigned long) (/home/paul/code/stdfuzz/build/a.out+0x4172e8)
#1 0x44cb02 in std::__1::basic_regex<char, std::__1::regex_traits >::__push_loop(unsigned long, unsigned long, std::__1::__owns_one_state, unsigned long, unsigned long, bool) /usr/lib/llvm-11/bin/../include/c++/v1/regex:4699:23
#2 0x44c962 in std::__1::basic_regex<char, std::__1::regex_traits >::__push_greedy_inf_repeat(unsigned long, std::__1::__owns_one_state, unsigned int, unsigned int) /usr/lib/llvm-11/bin/../include/c++/v1/regex:2863:10
#3 0x44ddbd in char const* std::__1::basic_regex<char, std::__1::regex_traits >::__parse_RE_dupl_symbol<char const*>(char const*, char const*, std::__1::__owns_one_state, unsigned int, unsigned int) /usr/lib/llvm-11/bin/../include/c++/v1/regex:3578:13
#4 0x44dc4b in char const std::__1::basic_regex<char, std::__1::regex_traits >::__parse_simple_RE<char const*>(char const*, char const*) /usr/lib/llvm-11/bin/../include/c++/v1/regex:3259:23
#5 0x44db1c in char const* std::__1::basic_regex<char, std::__1::regex_traits >::__parse_RE_expression<char const*>(char const*, char const*) /usr/lib/llvm-11/bin/../include/c++/v1/regex:3239:35
#6 0x436aff in char const* std::__1::basic_regex<char, std::__1::regex_traits >::__parse_basic_reg_exp<char const*>(char const*, char const*) /usr/lib/llvm-11/bin/../include/c++/v1/regex:3133:23
#7 0x436cdb in char const* std::__1::basic_regex<char, std::__1::regex_traits >::__parse_grep<char const*>(char const*, char const*) /usr/lib/llvm-11/bin/../include/c++/v1/regex:4617:9
#8 0x4366fd in char const* std::__1::basic_regex<char, std::__1::regex_traits >::__parse<char const*>(char const*, char const*) /usr/lib/llvm-11/bin/../include/c++/v1/regex:3107:19
#9 0x4363e1 in void std::__1::basic_regex<char, std::__1::regex_traits >::__init<char const*>(char const*, char const*) /usr/lib/llvm-11/bin/../include/c++/v1/regex:3077:31
#10 0x43617f in std::__1::basic_regex<char, std::__1::regex_traits >::basic_regex(char const*, std::__1::regex_constants::syntax_option_type) /usr/lib/llvm-11/bin/../include/c++/v1/regex:2556:9
#11 0x43609f in main /home/paul/code/stdfuzz/build/problem.cpp:3:1
#12 0x7f808b6bdd09 in __libc_start_main csu/../csu/libc-start.c:308:16
SUMMARY: LeakSanitizer: 16 byte(s) leaked in 1 allocation(s).
It reproduces on compiler explorer with clang 12 as well, clang trunk does not work at the moment there.
template <class _CharT, class _Traits>
void
basic_regex<_CharT, _Traits>::__push_loop(size_t __min, size_t __max,
__owns_one_state<_CharT>* __s, size_t __mexp_begin, size_t __mexp_end,
bool __greedy)
{
unique_ptr<__empty_state<_CharT> > __e1(new __empty_state<_CharT>(__end_->first()));
__end_->first() = nullptr; // <<<<<<<< LEAKS HERE
unique_ptr<__loop<_CharT> > __e2(new __loop<_CharT>(__loop_count_,
__s->first(), __e1.get(), __mexp_begin, __mexp_end, __greedy,
__min, __max));
__s->first() = nullptr;
__e1.release();
__end_->first() = new __repeat_one_loop<_CharT>(__e2.get());
__end_ = __e2->second();
__s->first() = __e2.release();
++__loop_count_;
}
It looks like a patch might be to modify __has_one_state::__first_
to be a unique_ptr
and update call sites accordingly.
I was considering how to add a regression test for this. Would the right place be in something like libcxx/test/std/re/re.leaks/issue_51001.cpp
? When I tried putting a test there, llvm-lit wouldn't identify that I had added a test. Then there would be the matter of writing a lit config file there that could hopefully add the leaks sanitizer to the command line.
The re/re.foo
naming matches the sections in the Standard. Something in libcxx/test/std/re/re.const/re.matchflag
seems more appropriate.
Lit requires two extensions to identify the test. So it should be named foo.pass.cpp
. (There are other options of pass
which will execute different lit tests.)
What am I missing? I can't reproduce it with https://godbolt.org/z/87xdP1jYx.
@philnik777 it looks like the shared godbolt uses address sanitizer instead of leak sanitizer.
Here's an updated version that should use leak sanitizer: https://godbolt.org/z/Y3hvs5hah
It also doesn't reproduce there, so I am going ahead and close the issue. Please double check and re-open if this still reproduces on your end with a recent Clang/libc++ version.
Hi, original bug submitter here. There has been a formatting change when transferred from bugzilla to github, two backslashes got lost which are important. The problem is still there:
@fhahn or @philnik777 could you please reopen this?
Actual reproducer:
#include <regex>
int main() {
std::regex{ R"(\(\)*)",
std::regex_constants::icase | std::regex_constants::nosubs |
std::regex::optimize | std::regex::collate | std::regex::grep };
}