[QUESTION] Is it possible to use radamsa as a library?

Question

[QUESTION] Is it possible to use radamsa as a library?

joxeankoret opened this issue 8 years ago · 23 comments

Hi!

I would like to embed radamsa in a few different places as a library instead of having to call a binary on the command line from my own fuzzers. Is there a (recommended) way of doing so?

Thanks in advance!

Answer 1 · 2016-11-24T13:05:23.000Z

Hi,

You are probably thinking about a scenario where the linked radamsa would constantly get something to mutate? Supporting that kind of use is something I've been planning to add for a while now.

You already can start radamsa with a fixed set of samples to serve a stream of fuzzed data (radamsa -n inf -o 31337 samples/*), and embed a trivial wrapper which just grabs the next testcase from localhost:31337. The trouble is that that you can't extend the sample set on the fly this way.

Answer 2 · 2016-11-24T13:09:48.000Z

Yes, something similar. Right now I'm using my own port of this: https://github.com/trailofbits/grr/tree/master/third_party/radamsa

...and I have written a Python wrapper for it. I have also used the socket mode for other things but... is not actually what I want.

Answer 3 · 2016-11-25T07:21:14.000Z

Grr seems to use a fun approach. The only issue I have with it, is that when you run radamsa once for each output it doesn't get a chance to collect data about the inputs. Therefore some mutations which may be useful will never occur. There should be some way to either pass state between the runs, or run the radamsa in a separate process and have the state there, which is why the TCP mode was originally added.

A few solutions would come to mind:

add an incremental mode to radamsa, so that it can store information gathered so far between runs, after which something like the current wrapper would work better
extend the TCP mode to handle the kind of use required in a library setting, and maybe bundle functions along which start a background radamsa automatically and call it
the same, but with stdio redirection to background process
add support for librarization upstream to owl, and get libradamsa as a result

Are there some issues why the background process approach doesn't work well in your test setups?

Answer 4 · 2016-11-25T09:54:23.000Z

The main reason is performance. There is no comparison between using an in-memory mutation engine in N machines (running independently on each one) than using network sockets. The other one is that, often, the same mutation engine needs to use different data sources, thus, making it required to open a listening socket (i.e., a radamsa listening instance) for each and all formats I want to fuzz. An easy example: PDFs. They have a lot of different "formats" being used inside a single PDF document.

Answer 5 · 2016-11-26T14:17:10.000Z

The last one is probably the correct solution. Owl should have some builtin support for building programs to be used as libraries in other C-programs. Then it would be possible to run radamsa incrementally without losing state from within one process.

Answer 6 · 2016-11-30T06:56:58.000Z

Current plan: radamsa (and owl programs in general) work by decoding the program to run from a fasl image, encode the command line parameters as a corresponding lisp object, run the program on that data and return the likely integer value the program returns. When used as a library, the programs could have a boot/init function to decode the image, and you could then correspondingly have a lisp object -> lisp object library call for any compiled function, which automatically en- and decodes the object. This way the same heap state could be used, and the library function being called would even remain a purely functional one with state.

In practice, you'd link a libradamsa and have something like radamsa(void *ptr, size_t s, &result, &result_size).

Answer 7 · 2016-11-30T15:55:30.000Z

That would be perfect! One thing: it would be great to be able to set the seed too. Something like void radamsa_init(unsigned long long seed);?

Answer 8 · 2016-12-04T07:56:25.000Z

That would end up working without radamsa-specific modifications in the planned solution, because you'd initially boot up the embedded radamsa anyway with a fake command line, on which you can give the seed and other settings as usual.

Answer 9 · 2016-12-21T09:43:15.000Z

owl-lisp/owl#15 is waiting for spare time. I got DoS'd by various kinds of extra work in December.

Answer 10 · 2016-12-21T09:48:37.000Z

I understand, don't worry :)

Answer 11 · 2017-02-10T13:57:31.000Z

Oh, by the way, https://github.com/aoh/ni might be of interest here. It's a quick port of some radamsa-mutations to C, which should be easy to embed.

Answer 12 · 2017-02-10T14:07:25.000Z

Thanks! But doesn't look like comparable. It seems "ni" doesn't try to infer the grammar from the inputs.

Answer 13 · 2017-06-01T08:53:19.000Z

@aoh Could you explain this

That would end up working without radamsa-specific modifications in the planned solution, because you'd initially boot up the embedded radamsa anyway with a fake command line, on which you can give the seed and other settings as usual.

If we do radamsa seed.pdf -o mutated.pdf every time for each seed, are we missing on radamsa specific mutation?

What about running radamsa -r seeds/*.pdf -o mutated.pdf every time we want a mutated sample, are we still missing on radamsa specific mutation?

Answer 14 · 2017-06-01T09:43:53.000Z

In both cases yes, though not by much. Some mutations are only possible if radamsa has had a chance to look at another file, or the same file from a different position. If you have sample files with '' and '', then the first output will never have something like '', because radamsa only learned about one of the attributes while generating the first fuzzed output.

As a workaround, if it's not easy to make sets of files at a time for testing, you can add --seek 2 especially in the latter case to your existing test scripts to allow more cross pollination between sample files. This won't be necessary after issue #24 is fixed, but you still should consider making sets of files at a time to allow radamsa to filter out duplicate testcases.

Answer 15 · 2017-06-01T09:50:56.000Z

@aoh thank you very much for the explanation. According to you what could be the best way to run radamsa when I have like n seeds in a directory seeds and I want to get full benefits of radamsa mutation and generation considering that I need to get the mutated sample by giving a seed file and also the recursive mode in radamsa

My current setup is:

seeds in a seed folder

a. mutation using the seed file. I every time run radamsa seed.pdf -o mutated.pdf to get a mutated sample
b. mutation using recursive mode. I every time run radamsa -r ./seeds/*.pdf -o mutated.pdf to get a mutated sample

Answer 16 · 2017-06-01T14:51:26.000Z

In both cases you could generate a bunch of files and serve them as they are needed. Something like

# make a file for fuzzed files if necessary
mkdir -p fuzzed
# check if more files need to be generated to fuzzed
ls fuzzed | grep -q radamsa || radamsa -n 1000 -o fuzzed/radamsa-%n.out seeds/*
# give the next file
mv "$(ls fuzzed | head -n 1)" mutated.pdf

Answer 17 · 2017-08-08T22:15:55.000Z

First tests of calling some compiled lisp code from c passed. The code looks roughly as what was planned above https://github.com/aoh/owl-lisp/blob/develop/c/lib.c#L49

Next step is to add a suitable function to radamsa and try it out from C using some similar wrapper.

Answer 18 · 2017-09-24T11:20:49.000Z

Can't wait to have a working version. Do you have an estimated date to start testing it?

Thanks a lot!

Answer 19 · 2017-09-29T02:51:12.000Z

need it as library too... hope to see this as soon as possible

Answer 20 · 2019-09-20T08:32:16.000Z

In case this is still relevant, fix is mostly done at https://gitlab.com/akihe/radamsa/issues/28 . Next versions will likely have a libradamsa.c and radamsa.h.

Answer 21 · 2019-09-20T08:51:28.000Z

That's awesome! Thank you very much!

Answer 22 · 2019-09-20T08:53:26.000Z

Fantastic!!

Answer 23 · 2019-09-20T18:01:24.000Z

Thank you!