x86_64 use tcmalloc is too slow
GoogleCodeExporter opened this issue · 8 comments
GoogleCodeExporter commented
my system:
Linux onlinegame10-119 2.6.30-1-amd64 #1 SMP Mon Aug 3 12:28:22 UTC
2009 x86_64 GNU/Linux
when i use tcmalloc with -ltcmalloc is ok。 but when i set the env
with "export HEAPPROFILE=/tmp/profile" and "export
HEAP_PROFILE_INUSE_INTERVAL=52428800", my programm is very slow,and the
memory is to raise higher and higher. i use this in the 32 system is all ok
Original issue reported on code.google.com by jbyw...@gmail.com
on 29 Dec 2009 at 8:43
GoogleCodeExporter commented
Hmm, this hasn't been our experience with x86_64. It's true heap-profiling
requires
more memory than a normal run, but it should be similar between i386 and
x86_64.
Well, within a factor of 2, since pointers are twice as big on 64-bit systems.
Do the heap profiles that are created look correct, when you examine them in
pprof?
I'ts hard to diagnose this kind of report remotely. If there's anything you
can do
to look around at the program being run under the heap profiler, to figure out
why
it's so slow, that might help. Perhaps run under strace or ltrace? Or run
with the
cpu profiler as well, and see where all the time is being spent?
Original comment by csilv...@gmail.com
on 29 Dec 2009 at 7:36
- Added labels: Priority-Medium, Type-Defect
GoogleCodeExporter commented
i use opcontrol to profile.
no use heap profiles:
180449 21.1349 no-vmlinux a __crc_ide_do_reset [ide_core]
82054 9.6105 no-vmlinux a __crc_snd_timer_continue
[snd_timer]
63308 7.4149 mysqld (no symbols)
50481 5.9125 liblua.so luaV_execute(lua_State*, int)
50379 5.9006 oprofiled (no symbols)
26678 3.1246 no-vmlinux a __crc_scsi_bios_ptable
[scsi_mod]
use heap profiles:
99473 34.7366 no-vmlinux a __crc_ide_do_reset [ide_core]
48070 16.7863 no-vmlinux a __crc_snd_timer_continue
[snd_timer]
13369 4.6685 oprofiled (no symbols)
12808 4.4726 mysqld (no symbols)
8308 2.9012 libunwind.so.7 apply_reg_state
7977 2.7856 no-vmlinux a __crc_scsi_bios_ptable
[scsi_mod]
we can see __crc_ide_do_reset and __crc_snd_timer_continue use more cpu
the memory run higher and higher ,is my programm reason.
Original comment by jbyw...@gmail.com
on 31 Dec 2009 at 8:45
GoogleCodeExporter commented
Well, it's no surprise that the heap-profiler uses more CPU. Though I don't
know
what __crc_snd_timer_continue is. Based on websearch, it seems to have
something to
do with sound. I don't know that it's related to the heap profile. The ide
reset is
related to disk activity, no surprise since the heap profile writes to disk. I
don't
think these profiles tell us very much. If it were me, I'd try compiling with
-pg
rather than use oprofile -- perhaps that would say more?
As for memory use, perhaps ltrace is the best way to figure out what's going
on. Are
you sure memory use keeps rising without bound?
Original comment by csilv...@gmail.com
on 31 Dec 2009 at 5:44
GoogleCodeExporter commented
Any more word on this?
Original comment by csilv...@gmail.com
on 10 Mar 2010 at 6:52
GoogleCodeExporter commented
It's been 6 months with no feedback, so I'm closing this CannotReproduce. Feel
free to reopen if you want to start looking into it again.
Original comment by csilv...@gmail.com
on 7 Jun 2010 at 10:48
- Changed state: CannotReproduce
GoogleCodeExporter commented
I think I'm having the same problem. I'm running on x86_64 and activating heap
checking causes a drastic slowdown (several orders of magnitude for a simple
program, see below). I don't have much experience at examining why this would
be so but am happy to follow any instructions.
$ cat tcmalloc_test.cpp
#include <stdlib.h>
int main(int argc, char ** argv) {
for (int i = 0; i < 100000; i++) {
malloc(1);
}
return 0;
}
$ g++ -o tcmalloc_test tcmalloc_test.cpp -ltcmalloc
$ time ./tcmalloc_test
real 0m0.010s
user 0m0.000s
sys 0m0.010s
$ cat run-with-profiler.sh
#!/bin/bash
HEAPPROFILE=/tmp/foo ./tcmalloc_test
$ time ./run-with-profiler.sh
Starting tracking the heap
Dumping heap profile to /tmp/foo.0001.heap (Exiting)
real 0m8.337s
user 0m0.510s
sys 0m7.810s
Original comment by alex.fl...@gmail.com
on 25 Aug 2010 at 5:48
GoogleCodeExporter commented
The story gets stranger. Upon recompiling google-perftools with -pg the
slowdown actually disappears! I compiled google-perftools with:
$ CFLAGS="-pg -g" CPPFLAGS="-pg -g" LDFLAGS="-pg -g" ./configure && make -j 3
&& sudo make install
I increased the number of iterations of the loop in tcmalloc_test.cpp to
100000000 (so 100M rather than 100K) and I now get 0.338s for the heap-profiled
version versus 0.056s for the non-profiled version, and I assume this is within
the expected slowdown caused by writing to disk etc.
Original comment by alex.fl...@gmail.com
on 25 Aug 2010 at 5:56
GoogleCodeExporter commented
Hmm, curious. Another thing you changed with -pg is you took out the
optimization. Try with just CPPFLAGS="-g" ./configure and see how that looks.
} I don't have much experience at examining why this would be so but am happy to
} follow any instructions
If -pg doesn't prove to be helpful, you could also try linking in tcmalloc's
profiler: add -lprofiler to the link line. I believe you should get both a
heap profile and cpu profile in that case, though I've never tried it.
Original comment by csilv...@gmail.com
on 25 Aug 2010 at 6:48