[QUESTION] Is it possible to use radamsa as a library?
joxeankoret opened this issue · 23 comments
Hi!
I would like to embed radamsa in a few different places as a library instead of having to call a binary on the command line from my own fuzzers. Is there a (recommended) way of doing so?
Thanks in advance!
Hi,
You are probably thinking about a scenario where the linked radamsa would constantly get something to mutate? Supporting that kind of use is something I've been planning to add for a while now.
You already can start radamsa with a fixed set of samples to serve a stream of fuzzed data (radamsa -n inf -o 31337 samples/*), and embed a trivial wrapper which just grabs the next testcase from localhost:31337. The trouble is that that you can't extend the sample set on the fly this way.
Yes, something similar. Right now I'm using my own port of this: https://github.com/trailofbits/grr/tree/master/third_party/radamsa
...and I have written a Python wrapper for it. I have also used the socket mode for other things but... is not actually what I want.
Grr seems to use a fun approach. The only issue I have with it, is that when you run radamsa once for each output it doesn't get a chance to collect data about the inputs. Therefore some mutations which may be useful will never occur. There should be some way to either pass state between the runs, or run the radamsa in a separate process and have the state there, which is why the TCP mode was originally added.
A few solutions would come to mind:
- add an incremental mode to radamsa, so that it can store information gathered so far between runs, after which something like the current wrapper would work better
- extend the TCP mode to handle the kind of use required in a library setting, and maybe bundle functions along which start a background radamsa automatically and call it
- the same, but with stdio redirection to background process
- add support for librarization upstream to owl, and get libradamsa as a result
Are there some issues why the background process approach doesn't work well in your test setups?
The main reason is performance. There is no comparison between using an in-memory mutation engine in N machines (running independently on each one) than using network sockets. The other one is that, often, the same mutation engine needs to use different data sources, thus, making it required to open a listening socket (i.e., a radamsa listening instance) for each and all formats I want to fuzz. An easy example: PDFs. They have a lot of different "formats" being used inside a single PDF document.
The last one is probably the correct solution. Owl should have some builtin support for building programs to be used as libraries in other C-programs. Then it would be possible to run radamsa incrementally without losing state from within one process.
Current plan: radamsa (and owl programs in general) work by decoding the program to run from a fasl image, encode the command line parameters as a corresponding lisp object, run the program on that data and return the likely integer value the program returns. When used as a library, the programs could have a boot/init function to decode the image, and you could then correspondingly have a lisp object -> lisp object library call for any compiled function, which automatically en- and decodes the object. This way the same heap state could be used, and the library function being called would even remain a purely functional one with state.
In practice, you'd link a libradamsa and have something like radamsa(void *ptr, size_t s, &result, &result_size).
That would be perfect! One thing: it would be great to be able to set the seed too. Something like void radamsa_init(unsigned long long seed);?
That would end up working without radamsa-specific modifications in the planned solution, because you'd initially boot up the embedded radamsa anyway with a fake command line, on which you can give the seed and other settings as usual.
owl-lisp/owl#15 is waiting for spare time. I got DoS'd by various kinds of extra work in December.
I understand, don't worry :)
Oh, by the way, https://github.com/aoh/ni might be of interest here. It's a quick port of some radamsa-mutations to C, which should be easy to embed.
Thanks! But doesn't look like comparable. It seems "ni" doesn't try to infer the grammar from the inputs.
@aoh Could you explain this
That would end up working without radamsa-specific modifications in the planned solution, because you'd initially boot up the embedded radamsa anyway with a fake command line, on which you can give the seed and other settings as usual.
If we do radamsa seed.pdf -o mutated.pdf
every time for each seed, are we missing on radamsa specific mutation?
What about running radamsa -r seeds/*.pdf -o mutated.pdf
every time we want a mutated sample, are we still missing on radamsa specific mutation?
In both cases yes, though not by much. Some mutations are only possible if radamsa has had a chance to look at another file, or the same file from a different position. If you have sample files with '' and '', then the first output will never have something like '', because radamsa only learned about one of the attributes while generating the first fuzzed output.
As a workaround, if it's not easy to make sets of files at a time for testing, you can add --seek 2 especially in the latter case to your existing test scripts to allow more cross pollination between sample files. This won't be necessary after issue #24 is fixed, but you still should consider making sets of files at a time to allow radamsa to filter out duplicate testcases.
@aoh thank you very much for the explanation. According to you what could be the best way to run radamsa
when I have like n
seeds in a directory seeds
and I want to get full benefits of radamsa
mutation and generation considering that I need to get the mutated sample by giving a seed file and also the recursive mode in radamsa
My current setup is:
- seeds in a
seed
folder
a. mutation using the seed
file. I every time run radamsa seed.pdf -o mutated.pdf
to get a mutated sample
b. mutation using recursive mode. I every time run radamsa -r ./seeds/*.pdf -o mutated.pdf
to get a mutated sample
In both cases you could generate a bunch of files and serve them as they are needed. Something like
# make a file for fuzzed files if necessary
mkdir -p fuzzed
# check if more files need to be generated to fuzzed
ls fuzzed | grep -q radamsa || radamsa -n 1000 -o fuzzed/radamsa-%n.out seeds/*
# give the next file
mv "$(ls fuzzed | head -n 1)" mutated.pdf
First tests of calling some compiled lisp code from c passed. The code looks roughly as what was planned above https://github.com/aoh/owl-lisp/blob/develop/c/lib.c#L49
Next step is to add a suitable function to radamsa and try it out from C using some similar wrapper.
Can't wait to have a working version. Do you have an estimated date to start testing it?
Thanks a lot!
need it as library too... hope to see this as soon as possible
In case this is still relevant, fix is mostly done at https://gitlab.com/akihe/radamsa/issues/28 . Next versions will likely have a libradamsa.c and radamsa.h.
That's awesome! Thank you very much!
Fantastic!!
Thank you!