llvm/llvm-project

memory leak in regex

Opened this issue · 8 comments

Bugzilla Link 51659
Version 11.0
OS Linux
Reporter LLVM Bugzilla Contributor
CC @mclow

Extended Description

The following program leaks memory (using clang 11 on Debian Bullseye, Debian clang version 11.0.1-2):

paul@machine:~/code/stdfuzz/build$ cat problem.cpp
#include
int
main()
{
std::regex{ R"(()*)",
std::regex_constants::icase | std::regex_constants::nosubs |
std::regex::optimize | std::regex::collate | std::regex::grep };
}

paul@machine:/code/stdfuzz/build$ clang++-11 --stdlib=libc++ problem.cpp -fsanitize=leak -g
paul@simdjson:
/code/stdfuzz/build$ ./a.out

=================================================================
==18364==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 16 byte(s) in 1 object(s) allocated from:
#​0 0x4172e8 in operator new(unsigned long) (/home/paul/code/stdfuzz/build/a.out+0x4172e8)
#​1 0x44cb02 in std::__1::basic_regex<char, std::__1::regex_traits >::__push_loop(unsigned long, unsigned long, std::__1::__owns_one_state, unsigned long, unsigned long, bool) /usr/lib/llvm-11/bin/../include/c++/v1/regex:4699:23
#​2 0x44c962 in std::__1::basic_regex<char, std::__1::regex_traits >::__push_greedy_inf_repeat(unsigned long, std::__1::__owns_one_state
, unsigned int, unsigned int) /usr/lib/llvm-11/bin/../include/c++/v1/regex:2863:10
#​3 0x44ddbd in char const* std::__1::basic_regex<char, std::__1::regex_traits >::__parse_RE_dupl_symbol<char const*>(char const*, char const*, std::__1::__owns_one_state, unsigned int, unsigned int) /usr/lib/llvm-11/bin/../include/c++/v1/regex:3578:13
#​4 0x44dc4b in char const
std::__1::basic_regex<char, std::__1::regex_traits >::__parse_simple_RE<char const*>(char const*, char const*) /usr/lib/llvm-11/bin/../include/c++/v1/regex:3259:23
#​5 0x44db1c in char const* std::__1::basic_regex<char, std::__1::regex_traits >::__parse_RE_expression<char const*>(char const*, char const*) /usr/lib/llvm-11/bin/../include/c++/v1/regex:3239:35
#​6 0x436aff in char const* std::__1::basic_regex<char, std::__1::regex_traits >::__parse_basic_reg_exp<char const*>(char const*, char const*) /usr/lib/llvm-11/bin/../include/c++/v1/regex:3133:23
#​7 0x436cdb in char const* std::__1::basic_regex<char, std::__1::regex_traits >::__parse_grep<char const*>(char const*, char const*) /usr/lib/llvm-11/bin/../include/c++/v1/regex:4617:9
#​8 0x4366fd in char const* std::__1::basic_regex<char, std::__1::regex_traits >::__parse<char const*>(char const*, char const*) /usr/lib/llvm-11/bin/../include/c++/v1/regex:3107:19
#​9 0x4363e1 in void std::__1::basic_regex<char, std::__1::regex_traits >::__init<char const*>(char const*, char const*) /usr/lib/llvm-11/bin/../include/c++/v1/regex:3077:31
#​10 0x43617f in std::__1::basic_regex<char, std::__1::regex_traits >::basic_regex(char const*, std::__1::regex_constants::syntax_option_type) /usr/lib/llvm-11/bin/../include/c++/v1/regex:2556:9
#​11 0x43609f in main /home/paul/code/stdfuzz/build/problem.cpp:3:1
#​12 0x7f808b6bdd09 in __libc_start_main csu/../csu/libc-start.c:308:16

SUMMARY: LeakSanitizer: 16 byte(s) leaked in 1 allocation(s).

It reproduces on compiler explorer with clang 12 as well, clang trunk does not work at the moment there.

template <class _CharT, class _Traits>
void
basic_regex<_CharT, _Traits>::__push_loop(size_t __min, size_t __max,
        __owns_one_state<_CharT>* __s, size_t __mexp_begin, size_t __mexp_end,
        bool __greedy)
{
    unique_ptr<__empty_state<_CharT> > __e1(new __empty_state<_CharT>(__end_->first()));
    __end_->first() = nullptr; // <<<<<<<< LEAKS HERE
    unique_ptr<__loop<_CharT> > __e2(new __loop<_CharT>(__loop_count_,
                __s->first(), __e1.get(), __mexp_begin, __mexp_end, __greedy,
                __min, __max));
    __s->first() = nullptr;
    __e1.release();
    __end_->first() = new __repeat_one_loop<_CharT>(__e2.get());
    __end_ = __e2->second();
    __s->first() = __e2.release();
    ++__loop_count_;
}

It looks like a patch might be to modify __has_one_state::__first_ to be a unique_ptr and update call sites accordingly.

I was considering how to add a regression test for this. Would the right place be in something like libcxx/test/std/re/re.leaks/issue_51001.cpp? When I tried putting a test there, llvm-lit wouldn't identify that I had added a test. Then there would be the matter of writing a lit config file there that could hopefully add the leaks sanitizer to the command line.

The re/re.foo naming matches the sections in the Standard. Something in libcxx/test/std/re/re.const/re.matchflag seems more appropriate.

Lit requires two extensions to identify the test. So it should be named foo.pass.cpp. (There are other options of pass which will execute different lit tests.)

What am I missing? I can't reproduce it with https://godbolt.org/z/87xdP1jYx.

fhahn commented

@philnik777 it looks like the shared godbolt uses address sanitizer instead of leak sanitizer.

Here's an updated version that should use leak sanitizer: https://godbolt.org/z/Y3hvs5hah

It also doesn't reproduce there, so I am going ahead and close the issue. Please double check and re-open if this still reproduces on your end with a recent Clang/libc++ version.

Hi, original bug submitter here. There has been a formatting change when transferred from bugzilla to github, two backslashes got lost which are important. The problem is still there:

godbolt

@fhahn or @philnik777 could you please reopen this?

perhaps @mordante could reopen this?

Actual reproducer:

#include <regex>

int main() {
  std::regex{ R"(\(\)*)",
  std::regex_constants::icase | std::regex_constants::nosubs |
  std::regex::optimize | std::regex::collate | std::regex::grep };
}