niuys/gperftools

x86_64 use tcmalloc is too slow

GoogleCodeExporter opened this issue · 8 comments

my system:
    Linux onlinegame10-119 2.6.30-1-amd64 #1 SMP Mon Aug 3 12:28:22 UTC 
2009 x86_64 GNU/Linux

    when i use tcmalloc with -ltcmalloc is ok。 but when i set the env 
with "export HEAPPROFILE=/tmp/profile" and "export 
HEAP_PROFILE_INUSE_INTERVAL=52428800", my programm is very slow,and the 
memory is to raise higher and higher. i use this in the 32 system is all ok

Original issue reported on code.google.com by jbyw...@gmail.com on 29 Dec 2009 at 8:43

Hmm, this hasn't been our experience with x86_64.  It's true heap-profiling 
requires 
more memory than a normal run, but it should be similar between i386 and 
x86_64.  
Well, within a factor of 2, since pointers are twice as big on 64-bit systems.

Do the heap profiles that are created look correct, when you examine them in 
pprof?  
I'ts hard to diagnose this kind of report remotely.  If there's anything you 
can do 
to look around at the program being run under the heap profiler, to figure out 
why 
it's so slow, that might help.  Perhaps run under strace or ltrace?  Or run 
with the 
cpu profiler as well, and see where all the time is being spent?

Original comment by csilv...@gmail.com on 29 Dec 2009 at 7:36

  • Added labels: Priority-Medium, Type-Defect
i use opcontrol to profile. 
no use heap profiles:
180449   21.1349  no-vmlinux               a __crc_ide_do_reset [ide_core]
82054     9.6105  no-vmlinux               a __crc_snd_timer_continue   
[snd_timer]
63308     7.4149  mysqld                   (no symbols)
50481     5.9125  liblua.so                luaV_execute(lua_State*, int)
50379     5.9006  oprofiled                (no symbols)
26678     3.1246  no-vmlinux               a __crc_scsi_bios_ptable     
[scsi_mod]

use heap profiles:
99473    34.7366  no-vmlinux               a __crc_ide_do_reset [ide_core]
48070    16.7863  no-vmlinux               a __crc_snd_timer_continue   
[snd_timer]
13369     4.6685  oprofiled                (no symbols)
12808     4.4726  mysqld                   (no symbols)
8308      2.9012  libunwind.so.7           apply_reg_state
7977      2.7856  no-vmlinux               a __crc_scsi_bios_ptable     
[scsi_mod]

we can see __crc_ide_do_reset and __crc_snd_timer_continue  use more cpu 

the memory run higher and higher ,is my programm reason.

Original comment by jbyw...@gmail.com on 31 Dec 2009 at 8:45

Well, it's no surprise that the heap-profiler uses more CPU.  Though I don't 
know
what __crc_snd_timer_continue is.  Based on websearch, it seems to have 
something to
do with sound.  I don't know that it's related to the heap profile.  The ide 
reset is
related to disk activity, no surprise since the heap profile writes to disk.  I 
don't
think these profiles tell us very much.  If it were me, I'd try compiling with 
-pg
rather than use oprofile -- perhaps that would say more?

As for memory use, perhaps ltrace is the best way to figure out what's going 
on.  Are
you sure memory use keeps rising without bound?

Original comment by csilv...@gmail.com on 31 Dec 2009 at 5:44

Any more word on this?

Original comment by csilv...@gmail.com on 10 Mar 2010 at 6:52

It's been 6 months with no feedback, so I'm closing this CannotReproduce.  Feel 
free to reopen if you want to start looking into it again.

Original comment by csilv...@gmail.com on 7 Jun 2010 at 10:48

  • Changed state: CannotReproduce
I think I'm having the same problem. I'm running on x86_64 and activating heap 
checking causes a drastic slowdown (several orders of magnitude for a simple 
program, see below). I don't have much experience at examining why this would 
be so but am happy to follow any instructions.

$ cat tcmalloc_test.cpp 
#include <stdlib.h>
int main(int argc, char ** argv) {
    for (int i = 0; i < 100000; i++) {
        malloc(1);
    }
    return 0;
}

$ g++ -o tcmalloc_test tcmalloc_test.cpp -ltcmalloc

$ time ./tcmalloc_test 

real    0m0.010s
user    0m0.000s
sys 0m0.010s

$ cat run-with-profiler.sh 
#!/bin/bash
HEAPPROFILE=/tmp/foo ./tcmalloc_test

$ time ./run-with-profiler.sh 
Starting tracking the heap
Dumping heap profile to /tmp/foo.0001.heap (Exiting)

real    0m8.337s
user    0m0.510s
sys 0m7.810s

Original comment by alex.fl...@gmail.com on 25 Aug 2010 at 5:48

The story gets stranger. Upon recompiling google-perftools with -pg the 
slowdown actually disappears! I compiled google-perftools with:

$ CFLAGS="-pg -g" CPPFLAGS="-pg -g" LDFLAGS="-pg -g" ./configure && make -j 3 
&& sudo make install

I increased the number of iterations of the loop in tcmalloc_test.cpp to 
100000000 (so 100M rather than 100K) and I now get 0.338s for the heap-profiled 
version versus 0.056s for the non-profiled version, and I assume this is within 
the expected slowdown caused by writing to disk etc.

Original comment by alex.fl...@gmail.com on 25 Aug 2010 at 5:56

Hmm, curious.  Another thing you changed with -pg is you took out the 
optimization.  Try with just CPPFLAGS="-g" ./configure and see how that looks.

} I don't have much experience at examining why this would be so but am happy to
} follow any instructions

If -pg doesn't prove to be helpful, you could also try linking in tcmalloc's 
profiler: add -lprofiler to the link line.  I believe you should get both a 
heap profile and cpu profile in that case, though I've never tried it.

Original comment by csilv...@gmail.com on 25 Aug 2010 at 6:48