hunter-ht-2018/ptfuzzer

Why is it so slow?

Opened this issue · 16 comments

When I use afl-fuzz natively with afl-gcc and afl-fuzz, I get for an example program 2000 executions per second.

When I use ptfuzzer for the the same program and seed input files (but of couse compiled with normal gcc -O3) I only get < 200 executions per second. That is a speed loss of 90%.

When I use native afl-fuzz -Q qemu mode I get 300 executions per second.

as intel_pt should just have 10-20% overhead to native program running time. Something seems to be ineffective in ptfuzzer, somewhere a lot of time is lost per execution.

Has someone looked into this?

@vanhauser-thc
What's your target program and your cmd line input?

@zhanggenex I am using unrar (apt source unrar) and compile one with -O3 and no-pie (saved to ~/unrar-bin), and one with afl-gcc (saved to ~/unrar-afl). (has to be done by editing "makefile", because it does not honor CXX or LDXX environment variables)

I created a simple .rar with "echo test > test.txt; rar a test.rar test.txt"

and the command line is: unrar p -inul

so:

afl-fuzz -i /tmp/in -o /tmp/out1 -- ~/unrar-afl p -inul @@

afl-fuzz -Q -i /tmp/in -o /tmp/out2 -- ~/unrar-bin p -inul @@

python ./bin/ptfuzzer.py "-i /tmp/in -o /tmp/out3" "~/unrar-bin p -inul "

thanks for looking into this!

@vanhauser-thc
Ptfuzz decodes pt data packects after every test case exuecution. The decoding process is the main cause of performance overhead because in our current implementation, we have to pause the fuzzing process to wait for the decoding process. Any way, this ideally can be parallized by some effort. But we are unfortauantely sort of mampwoer resource to handle this problem.

@vanhauser-thc Does unrar unintentionally write disk during fuzzing?

@zhouxucs it does not, the command I use writes only to stdout.
I verified this with a taintracer I am coding (not for your question but when analyzing for something different).

Here is a new paper: https://arxiv.org/pdf/1905.10499.pdf

its also about an intel-pt fuzzer, but claims to have much better performance than yours.
The main reason seems to be that they do the pt decoding in parallel in a 2nd process at once when the events come in. so of course it is way faster. but it costs also twice the CPU.

it is not released yet, once the code is released maybe you can learn from that code and optimize it for yours?

@vanhauser-thc Thank you for your information. Afl-pt did an excellent work. It leverages multithreading and on-the-fly decoding to improve performance, which is very helpful when exploiting deep paths. We will study this method.

@vanhauser-thc Hi, could you provide a link of the unrar you tested because I cannot find it with apt source unrar?

@zhouxucs apt-get source unrar worked for me but I will attach it here
unrar-nonfree-5.6.6.tar.gz

@vanhauser-thc I have tested fuzzing unrar, ptfuzzer gets 500 executions per second while afl gets 2000 executions per second. The slowdown is partly due to forkserver. When I disable forkserver in afl, its speed drops to 1000. Unfortunately, ptfuzzer does not support forkserver currently.

The other part for the slowdown may be unrar is a multithreaded program. Multithreading is supposed to cause more slowdown in PT decoding. This is not confirmed because I cannot config unrar to run in a single thread mode currently.

By the way, I don't think the edges collected by the instrumented code in AFL is correct for multithreaded program.

OK I spent a lot of time looking into intel-pt and using that for fuzzing.

Also in speed comparisons ptfuzzer seems to be slower than afl-pt it is not. alf-pt does the decoding in parallel and therefore needs 2x the CPU power, whereas ptfuzzer does the decoding after a fuzz case and therefore just needs 1x CPU power. If this is added to the calculation (because you can run two ptfuzzers for the same time and resources as one afl-pt instance) then ptfuzzer is quite a bit faster.

Overall however intel pt is suboptimal for fuzzing as the pt decoding is very slow, whichever effective decoding, partial decoding, heuristic, etc. method is used.

qemu_mode is faster in comparison. so for Linux pt does not make sense. However for Windows intel pt could be an option as qemu on windows can only emulate whole OS, not single binaries as on Linux. Also there is no afl-dyninst implementation for windows, so windows would be the ideal target platform for intel pt based fuzzing.

@vanhauser-thc Is there any windows fuzzer with pt released?

@zhanggenex not that I know of. and it would help as the only effective fuzzer for windows that I know of is winafl, and that uses dynamorio. so having something with intel-pt would really be of value I think

@vanhauser-thc sure it is, I agree.

@vanhauser-thc @zhanggenex @zhouxucs

Hi all, just FYI. It seems that a new intel-pt decoding library is released here. Maybe anyone interesting can take a look.

I am not the developer of libxdc but their experiment results seem to be convincing. Hence, I guess it may be a good choice to mitigate ptfuzzer to this library?