ASan needs to keep track of all the libraries loaded during the process lifetime

Question

ASan needs to keep track of all the libraries loaded during the process lifetime

ramosian-glider opened this issue 9 years ago · 34 comments

Originally reported on Google Code with ID 89

In the following situation:

> malloc or free gets calls from xyz.dylib
> xyz.dylib gets unloaded 
> a bug happens and we want to report the stack trace of malloc/free which has xyz.dylib
in it. 

we need to restore the library layout at the stack collection time in order to symbolize
it correctly.

Possible solution:

> We keep an epoch counter that is incremented for each dlopen and 
> dlclose (we also write down the [un]loaded library and the slide value 
> each time we do that). For each stack we just sacrifice one frame to 
> keep the corresponding counter. When symbolizing, it's easy to replay 
> the sequence of dlopen/dlclose events and find out which libraries 
> were loaded.

Reported by ramosian.glider on 2012-07-18 09:21:08

Answer 1 · 2015-08-31T15:57:54.000Z

Reported by pbos@webrtc.org on 2015-04-23 09:21:22

Blocking: #3402

Answer 2 · 2017-08-10T10:37:40.000Z

Any chance for this getting fixed some day?

Answer 3 · 2017-08-10T14:56:25.000Z

@obfuscated we are not working on this. What exactly do you need?

Answer 4 · 2017-08-10T15:49:25.000Z

@kcc I'm getting unknown modules lines in the callstacks printed by the leak report from asan.

Our application uses plugins (dlopened .so files on linux) quite extensively, so the leak reports aren't too useful. Also there are some leaks that I want to suppress, but I'm not sure I can, because of the lines.

I'm using clang-3.9.1, centos 6, linux.

Answer 5 · 2017-08-10T16:24:52.000Z

Understood.
Yes, this is the exact problem discussed here and no, we don't have plans to address it in near future, sorry.
The reasons are that a) this is not a very common use case among other users and b) implementation is unlikely to be simple and we alreay have enough complexity to maintain.

I think there could also be a simple workaround on your side: don't dlclose anything when testing under asan/lsan

Answer 6 · 2017-08-12T22:51:08.000Z

OK, going with the no-dlclose workaround. I remembered that I've done this for valgrind, but there is another place in our code that does dlclose calls, so patching these resolved the problem.

I've stopped seeing the leaks for the global variables in dlopened shared libraries, but I guess this is expected.

Answer 7 · 2017-08-14T18:31:15.000Z

I've stopped seeing the leaks for the global variables in dlopened shared libraries, but I guess this is expected.

Depends on what exactly you mean here

Answer 8 · 2017-08-15T21:36:45.000Z

I'm still investigating, so I'm not sure if it is a problem of the tool, my setup or real bug in the application.

Answer 9 · 2018-02-16T20:25:43.000Z

Is this still unlikely to be fixed? Given the prevalence of shard libraries in most production system it seems very likely for people to trip over this.

We're running into this problem working with the CUDA libraries for example. They have a leak in an internal driver library that they dlopen and dlclose themselves.

Answer 10 · 2018-02-16T21:05:51.000Z

We are not planing any work in this space, my previous comment (Aug 10 2017) still holds.

If you have some CUDA-specific problem I suggest you open a separate issue -- we may be able to find a specialized solution.

Answer 11 · 2018-04-04T18:40:57.000Z

I just ran into this too. It's a common problem for different leak detectors. Our solution (like @kcc suggests and @obfuscated implemented) was to simply not dlclose() any handles when running under a leak detector.

For valgrind the solution was:

#ifdef HAVE_VALGRIND_H
#  include <valgrind.h>
#else
#  define RUNNING_ON_VALGRIND 0
#endif

static int fr_dlfree(dl_t *module)
{
	...
	/*
	 *	Only dlclose() handle if we're *NOT* running under valgrind
	 *	as it unloads the symbols valgrind needs.
	 */
	if (!RUNNING_ON_VALGRIND) dlclose(module->handle);
	...
}

There's a couple of approaches for LSAN detection in this stack overflow post.

It would be nice if there were a similar function/macro available to the one valgrind provides, so that we could do this in a way that didn't rely on implementation details.

Answer 12 · 2018-06-05T18:18:09.000Z

@kcc: If we still don't plan to fix this, can we close this bug?

Answer 13 · 2018-06-05T19:00:09.000Z

Not going to work on this any time soon. Closing for now, will reopen if there is
both high user pressure and resources on our side.

Answer 14 · 2018-06-18T14:48:29.000Z

I have this same problem in multiple projects and a good example of how to reproduce this problem was provided here https://stackoverflow.com/questions/44627258/addresssanitizer-and-loading-of-dynamic-libraries-at-runtime-unknown-module

Valgrind over the same code appears to be working fine for me and tracking down these leaks with ASAN isn't straightforward so a fix for this dynamic linking issue to give better symbols would be appreciated.

Answer 15 · 2018-07-19T15:26:37.000Z

Just in case it's helpful:
When this problem occurs to me I simply do a LD_PRELOAD of a fake dlclose that does nothing and then I don't have to fill my code with #ifdef.. etc.

#include <stdio.h>
int dlclose(void *handle) {
	;
}

LD_PRELOAD="/usr/lib/libasan.so ../fake-dlclose/dlclose.so" ./run

Answer 16 · 2018-09-07T02:00:04.000Z

Just ran into this, mentioning it here in case resources ever become available for a fix

Answer 17 · 2019-02-04T08:08:46.000Z

An option to get a reliable stack trace on Linux is to use dlopen(foo.so, RTLD_NODELETE) at library loading. This keeps the lib loaded at exit and let ASAN resolve symbols and report memleaks correctly.

Answer 18 · 2019-04-18T10:00:34.000Z

@bungow Thanks that worked, but you might want to return 0;!

#include <stdio.h>
int dlclose(void*) { return 0; }

clang++ --shared dlclose.c -o libdlclose.so
LD_PRELOAD="./libdlclose.so" ./my_command

Note the ./, otherwise it won't find it.

Answer 19 · 2019-04-18T10:03:00.000Z

Also, since there is a decent workaround, perhaps at the point where it prints <unknown module> it could instead print <unknown module; see https://..../dlopen_workaround.html>?

Answer 20 · 2019-09-27T10:15:56.000Z

I came across this problem again and had zero memory of it ever happening before. Thanks past self! I will make a PR to get a help message added to the output.

Answer 21 · 2019-09-27T13:54:06.000Z

Even better if there could also be a mechanism to detect running under ASAN, so we could implement the workaround reliably. It would probably be very minimal effort, but would be very useful.

Answer 22 · 2019-10-03T13:49:47.000Z

That's a good point. Since ASAN intercepts some methods already, can it not also intercept dlclose()? I don't think there are any downsides to dlclose() being a NOP. On Mac at least it sometimes does nothing anyway.

Answer 23 · 2020-06-05T07:01:26.000Z

@Timmmm did you explore your potential solution? I think this would be great.

Answer 24 · 2020-06-05T09:37:09.000Z

I could imagine certain apps to expect dlclose() to work, so that it's possible to load a library twice.

Answer 25 · 2020-06-05T09:39:48.000Z

@ramosian-glider that's probably true, however what's the more common case?

We could have this as the default behavior, but an ASAN_OPTIONS flag to disable it.

Answer 26 · 2020-06-05T15:03:52.000Z

@Timmmm did you explore your potential solution? I think this would be great.

No, sorry! I just use the LD_PRELOAD trick.

Answer 27 · 2020-06-05T15:13:22.000Z

@Timmmm no worries, it already helped me a lot to find this trick here. I just think we could save a lot of manhours in finding the same solution for everybody here.

Answer 28 · 2020-11-04T13:12:08.000Z

The downside that I see with removing/disabling dlclose() while sanitizing using any of the methods described is that static memory in the dynamic library won't be freed, so pointers to heap allocated memory stored in static variables will no longer be detected as leaks by LeakSanitizer, while in production code they would be actual leaks.

Answer 29 · 2021-07-02T09:18:11.000Z

This is a bit of a pain for us as we have nearly 200 dynamicaly loaded modules and no idea which one is leaking. (A marginal number of bytes, but still)

Unfortunately not calling dlclose(), making it a no-op, or using RTLD_NODELETE means we no longer see the memory leak, as @antoneliasson mentioned.

Answer 30 · 2022-02-03T11:07:56.000Z

Valgrind supports this feature since https://bugs.kde.org/show_bug.cgi?id=79362 with the parameter --keep-debuginfo=yes

Answer 31 · 2022-11-22T13:32:42.000Z

I ran into this issue, and the work-arounds of not unloading the library didn't work for me,
since the leak was only detected when the library was unloaded.

FYI: here's links to the patches in Valgrind which support the equivalent feature in Valgrind:
https://sourceware.org/git/?p=valgrind.git;a=commit;h=cceed053ce876560b9a7512125dd93c7fa059778
https://sourceware.org/git/?p=valgrind.git;a=commit;h=f8ae2f95d6d717aa6d3923635b9f6f87af9b7cf1

Answer 32 · 2023-09-11T13:02:21.000Z

I have a specific case of this problem where all workarounds mentioned here fail: There's a Java connector for a native library libfoo.so loaded with JNA and I want to detect any leaks caused by libfoo while running tests for that connector (./gradlew build, it's a larger Java project by other people). The problem is in system library liblcms2.so.2, which seemingly unloads at random, giving some leaks which are nicely marked as caused by that library and some which are only marked as "unknown module". I know it's this library because if I don't suppress liblcms2, it's always the same leaks, just some/all/none become "unknown module" in repeated runs. I don't know how to fix it.

preloading a fake dlclose has no effect
preloading a library that dlopens liblcms2 with RTLD_NODELETE has no effect

Ideally, I'd like to suppress all leaks which don't involve libfoo, including those coming from "unknown modules" only. I haven't found a way to do that, though - leak suppressions don't offer something like negation.

The whole thing is running in a Ubuntu 18.04 docker container. Full command is LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libasan.so.4" ASAN_OPTIONS="handle_segv=0" LSAN_OPTIONS="suppressions=lsan.supp" ./gradlew clean build, where the file lsan.supp contains

leak:libjvm
leak:libjli
leak:libz
leak:liblcms2

Answer 33 · 2024-04-16T22:14:34.000Z

Just in case it's helpful: When this problem occurs to me I simply do a LD_PRELOAD of a fake dlclose that does nothing and then I don't have to fill my code with #ifdef.. etc.
#include <stdio.h>
int dlclose(void *handle) {
	;
}
LD_PRELOAD="/usr/lib/libasan.so ../fake-dlclose/dlclose.so" ./run

Another option I found is to override dlopen() to inject RTLD_NODELETE.

#define _GNU_SOURCE
#include <dlfcn.h>
#include <link.h>
#include <stdio.h>
#include <string.h>

// Override dlopen() function and inject RTLD_NODELETE so the library
// doesn't get deleted on close().
// This helps with asan traces with <unknown module>
void* dlopen(const char* filename, int flags){
    typedef void* (*dlopen_t)(const char*, int);
    dlopen_t original_dlopen = (dlopen_t)dlsym(RTLD_NEXT, "dlopen");

    printf("Intercepted a dlopen call, injecting RTLD_NODELETE\n");
    flags |= RTLD_NODELETE;
    return original_dlopen(filename, flags);
}

Build with

gcc-10 -fpic --shared interceptor.c -o libinterceptor.so -ldl

Then pre-load it after asan. Asan must be loaded first. You'll have to adjust the following paths.

LD_PRELOAD=/lib/x86_64-linux-gnu/libasan.so.6.0.0:/home/eriff/dlopeninterceptor/libinterceptor.so ./app

Answer 34 · 2024-04-19T12:22:03.000Z

@ericriff that may work for some people, but unfortunately making dlclose() a no-op, or not calling dlclose() at all, or using RTLD_NODELETE on dlopen() means we no longer see some of the memory leaks, as antoneliasson mentioned in #89 (comment).

The situation is unchanged. For us it still a pain because in our software we have 200+ dynamicaly loaded modules (.so) and when a (small) memory leak is reported, we have no idea which one is leaking.