Conflicts with other utilities/libraries overriding application behavior via LD_PRELOAD

Question

Conflicts with other utilities/libraries overriding application behavior via LD_PRELOAD

Closed this issue 12 years ago · 5 comments

....like libeatmydata, for example. If LD_PRELOAD variable is already defined, it should be extended, not overridden. Here's an example patch:

diff --git a/nocache b/nocache
index f6df6b1..be36b61 100755
--- a/nocache
+++ b/nocache
@@ -1,3 +1,10 @@
 #!/bin/sh
-export LD_PRELOAD="./nocache.so"
+libnocache="/usr/local/lib/nocache.so"
+
+if [ -n "$LD_PRELOAD" ]; then
+    export LD_PRELOAD="$libnocache $LD_PRELOAD"
+else
+    export LD_PRELOAD="$libnocache"
+fi
+
 exec "$@"

Answer 1 · 2013-04-05T12:28:25.000Z

Thanks for the patch! I have applied it, only changing the first line in order to use the nocache library in the current working directory. (This will be the most common use case for people trying it out.)

Answer 2 · 2013-04-08T08:33:45.000Z

Julius, you should also reflect it in nocache.global. Btw, thanks for developing this super-useful utility.

Answer 3 · 2013-04-09T20:46:00.000Z

Thanks! I forgot about that one.

I’m glad you like the tool. To be honest, I only wrote this as a proof of concept. I’m not actively using it anywhere, so I would be interested to know where/how you’re using it…

Answer 4 · 2013-04-10T00:48:33.000Z

Well, anywhere whenever a one-time scanning/streaming access pattern threatens to flush out useful page cache, or so :-) For example:

backups (files/rsync/etc.)
antivirus scans, or otherwise searching a large data set
copying or otherwise streaming huge files
folks report it to be useful on machines sharing large file libraries via P2P
whenever main memory used for page cache is much smaller than actual disk storage, i.e. NAS/SAN with terabytes/petabytes of data
whenever disk storage has high random access latency (i.e., 4200rpm laptop hard disk, or an optical drive)
when an app does its own caching (i.e., a database)

Wikipedia describes some of the pitfall scenarios for common caching algorithms:
https://en.wikipedia.org/wiki/Page_replacement_algorithms
https://en.wikipedia.org/wiki/Adaptive_Replacement_Cache

OS tries its best to guess and do the right thing, but sometimes manually instrumenting it via hints still goes a long way. Manually tweaking caching behavior of a NAS/filer helps achieve higher IO utilization rates.

Come to think of it, you might consider merging nocache with libeatmydata, since both solve related problems via similar approaches.

Answer 5 · 2013-04-10T06:35:55.000Z

Hi,

OS tries its best to guess and do the right thing, but sometimes
manually instrumenting it via hints still goes a long way.

Some programs already set the hinting flag by themselves, for others
you can use an LD_PRELOAD hack like this one to hijack the syscall
wrappers.

The problem I see is that on Linux, we can only use the
POSIX_FADV_DONTNEED flag, which is used to mark pages that have
already been accessed as “I wont’t need them any more”.

The semantically correct way would be for a backup application to use
POSIX_FADV_NOREUSE, saying “I’ll access it once” and the Kernel
should do the rest. However, this flag is a no-op right now (see
Kernel source, file mm/fadvise.c.)

There were efforts put forward to implement this flag properly (and
also change some other fadvise() semantices IIRC), but the work was
not merged: https://lwn.net/Articles/449420/

Come to think of it, you might consider merging nocache with
libeatmydata, since both solve related problems via similar
approaches.

I will have a look at it.

Thanks,
Julius