ASan needs to keep track of all the libraries loaded during the process lifetime
ramosian-glider opened this issue ยท 34 comments
Originally reported on Google Code with ID 89
In the following situation:
> malloc or free gets calls from xyz.dylib
> xyz.dylib gets unloaded
> a bug happens and we want to report the stack trace of malloc/free which has xyz.dylib
in it.
we need to restore the library layout at the stack collection time in order to symbolize
it correctly.
Possible solution:
> We keep an epoch counter that is incremented for each dlopen and
> dlclose (we also write down the [un]loaded library and the slide value
> each time we do that). For each stack we just sacrifice one frame to
> keep the corresponding counter. When symbolizing, it's easy to replay
> the sequence of dlopen/dlclose events and find out which libraries
> were loaded.
Reported by ramosian.glider
on 2012-07-18 09:21:08
Reported by pbos@webrtc.org
on 2015-04-23 09:21:22
- Blocking: #3402
Any chance for this getting fixed some day?
@obfuscated we are not working on this. What exactly do you need?
@kcc I'm getting unknown modules lines in the callstacks printed by the leak report from asan.
Our application uses plugins (dlopened .so files on linux) quite extensively, so the leak reports aren't too useful. Also there are some leaks that I want to suppress, but I'm not sure I can, because of the lines.
I'm using clang-3.9.1, centos 6, linux.
Understood.
Yes, this is the exact problem discussed here and no, we don't have plans to address it in near future, sorry.
The reasons are that a) this is not a very common use case among other users and b) implementation is unlikely to be simple and we alreay have enough complexity to maintain.
I think there could also be a simple workaround on your side: don't dlclose anything when testing under asan/lsan
OK, going with the no-dlclose workaround. I remembered that I've done this for valgrind, but there is another place in our code that does dlclose calls, so patching these resolved the problem.
I've stopped seeing the leaks for the global variables in dlopened shared libraries, but I guess this is expected.
I've stopped seeing the leaks for the global variables in dlopened shared libraries, but I guess this is expected.
Depends on what exactly you mean here
I'm still investigating, so I'm not sure if it is a problem of the tool, my setup or real bug in the application.
Is this still unlikely to be fixed? Given the prevalence of shard libraries in most production system it seems very likely for people to trip over this.
We're running into this problem working with the CUDA libraries for example. They have a leak in an internal driver library that they dlopen and dlclose themselves.
We are not planing any work in this space, my previous comment (Aug 10 2017) still holds.
If you have some CUDA-specific problem I suggest you open a separate issue -- we may be able to find a specialized solution.
I just ran into this too. It's a common problem for different leak detectors. Our solution (like @kcc suggests and @obfuscated implemented) was to simply not dlclose()
any handles when running under a leak detector.
For valgrind the solution was:
#ifdef HAVE_VALGRIND_H
# include <valgrind.h>
#else
# define RUNNING_ON_VALGRIND 0
#endif
static int fr_dlfree(dl_t *module)
{
...
/*
* Only dlclose() handle if we're *NOT* running under valgrind
* as it unloads the symbols valgrind needs.
*/
if (!RUNNING_ON_VALGRIND) dlclose(module->handle);
...
}
There's a couple of approaches for LSAN detection in this stack overflow post.
It would be nice if there were a similar function/macro available to the one valgrind provides, so that we could do this in a way that didn't rely on implementation details.
Not going to work on this any time soon. Closing for now, will reopen if there is
both high user pressure and resources on our side.
I have this same problem in multiple projects and a good example of how to reproduce this problem was provided here https://stackoverflow.com/questions/44627258/addresssanitizer-and-loading-of-dynamic-libraries-at-runtime-unknown-module
Valgrind over the same code appears to be working fine for me and tracking down these leaks with ASAN isn't straightforward so a fix for this dynamic linking issue to give better symbols would be appreciated.
Just in case it's helpful:
When this problem occurs to me I simply do a LD_PRELOAD of a fake dlclose that does nothing and then I don't have to fill my code with #ifdef.. etc.
#include <stdio.h>
int dlclose(void *handle) {
;
}
LD_PRELOAD="/usr/lib/libasan.so ../fake-dlclose/dlclose.so" ./run
Just ran into this, mentioning it here in case resources ever become available for a fix
An option to get a reliable stack trace on Linux is to use dlopen(foo.so, RTLD_NODELETE)
at library loading. This keeps the lib loaded at exit and let ASAN resolve symbols and report memleaks correctly.
@bungow Thanks that worked, but you might want to return 0;
!
#include <stdio.h>
int dlclose(void*) { return 0; }
clang++ --shared dlclose.c -o libdlclose.so
LD_PRELOAD="./libdlclose.so" ./my_command
Note the ./
, otherwise it won't find it.
Also, since there is a decent workaround, perhaps at the point where it prints <unknown module>
it could instead print <unknown module; see https://..../dlopen_workaround.html>
?
I came across this problem again and had zero memory of it ever happening before. Thanks past self! I will make a PR to get a help message added to the output.
Even better if there could also be a mechanism to detect running under ASAN, so we could implement the workaround reliably. It would probably be very minimal effort, but would be very useful.
That's a good point. Since ASAN intercepts some methods already, can it not also intercept dlclose()
? I don't think there are any downsides to dlclose()
being a NOP. On Mac at least it sometimes does nothing anyway.
@Timmmm did you explore your potential solution? I think this would be great.
I could imagine certain apps to expect dlclose()
to work, so that it's possible to load a library twice.
@ramosian-glider that's probably true, however what's the more common case?
We could have this as the default behavior, but an ASAN_OPTIONS flag to disable it.
@Timmmm did you explore your potential solution? I think this would be great.
No, sorry! I just use the LD_PRELOAD
trick.
@Timmmm no worries, it already helped me a lot to find this trick here. I just think we could save a lot of manhours in finding the same solution for everybody here.
The downside that I see with removing/disabling dlclose() while sanitizing using any of the methods described is that static memory in the dynamic library won't be freed, so pointers to heap allocated memory stored in static variables will no longer be detected as leaks by LeakSanitizer, while in production code they would be actual leaks.
This is a bit of a pain for us as we have nearly 200 dynamicaly loaded modules and no idea which one is leaking. (A marginal number of bytes, but still)
Unfortunately not calling dlclose()
, making it a no-op, or using RTLD_NODELETE
means we no longer see the memory leak, as @antoneliasson mentioned.
Valgrind supports this feature since https://bugs.kde.org/show_bug.cgi?id=79362 with the parameter --keep-debuginfo=yes
I ran into this issue, and the work-arounds of not unloading the library didn't work for me,
since the leak was only detected when the library was unloaded.
FYI: here's links to the patches in Valgrind which support the equivalent feature in Valgrind:
https://sourceware.org/git/?p=valgrind.git;a=commit;h=cceed053ce876560b9a7512125dd93c7fa059778
https://sourceware.org/git/?p=valgrind.git;a=commit;h=f8ae2f95d6d717aa6d3923635b9f6f87af9b7cf1
I have a specific case of this problem where all workarounds mentioned here fail: There's a Java connector for a native library libfoo.so
loaded with JNA and I want to detect any leaks caused by libfoo while running tests for that connector (./gradlew build
, it's a larger Java project by other people). The problem is in system library liblcms2.so.2
, which seemingly unloads at random, giving some leaks which are nicely marked as caused by that library and some which are only marked as "unknown module". I know it's this library because if I don't suppress liblcms2, it's always the same leaks, just some/all/none become "unknown module" in repeated runs. I don't know how to fix it.
- preloading a fake dlclose has no effect
- preloading a library that dlopens liblcms2 with RTLD_NODELETE has no effect
Ideally, I'd like to suppress all leaks which don't involve libfoo, including those coming from "unknown modules" only. I haven't found a way to do that, though - leak suppressions don't offer something like negation.
The whole thing is running in a Ubuntu 18.04 docker container. Full command is LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libasan.so.4" ASAN_OPTIONS="handle_segv=0" LSAN_OPTIONS="suppressions=lsan.supp" ./gradlew clean build
, where the file lsan.supp
contains
leak:libjvm
leak:libjli
leak:libz
leak:liblcms2
Just in case it's helpful: When this problem occurs to me I simply do a LD_PRELOAD of a fake dlclose that does nothing and then I don't have to fill my code with #ifdef.. etc.
#include <stdio.h> int dlclose(void *handle) { ; }
LD_PRELOAD="/usr/lib/libasan.so ../fake-dlclose/dlclose.so" ./run
Another option I found is to override dlopen()
to inject RTLD_NODELETE
.
#define _GNU_SOURCE
#include <dlfcn.h>
#include <link.h>
#include <stdio.h>
#include <string.h>
// Override dlopen() function and inject RTLD_NODELETE so the library
// doesn't get deleted on close().
// This helps with asan traces with <unknown module>
void* dlopen(const char* filename, int flags){
typedef void* (*dlopen_t)(const char*, int);
dlopen_t original_dlopen = (dlopen_t)dlsym(RTLD_NEXT, "dlopen");
printf("Intercepted a dlopen call, injecting RTLD_NODELETE\n");
flags |= RTLD_NODELETE;
return original_dlopen(filename, flags);
}
Build with
gcc-10 -fpic --shared interceptor.c -o libinterceptor.so -ldl
Then pre-load it after asan. Asan must be loaded first. You'll have to adjust the following paths.
LD_PRELOAD=/lib/x86_64-linux-gnu/libasan.so.6.0.0:/home/eriff/dlopeninterceptor/libinterceptor.so ./app
@ericriff that may work for some people, but unfortunately making dlclose()
a no-op, or not calling dlclose()
at all, or using RTLD_NODELETE
on dlopen()
means we no longer see some of the memory leaks, as antoneliasson mentioned in #89 (comment).
The situation is unchanged. For us it still a pain because in our software we have 200+ dynamicaly loaded modules (.so) and when a (small) memory leak is reported, we have no idea which one is leaking.