Stack overflow for recursive type
jcaesar opened this issue · 5 comments
The following code
use arbitrary::{Arbitrary, Unstructured};
#[derive(Debug, Arbitrary)]
enum Nat {
Succ(Box<Nat>),
Zero,
}
fn main() {
Nat::arbitrary(&mut Unstructured::new(&[])).ok();
}
Results in a stack overflow
thread 'main' has overflowed its stack
fatal runtime error: stack overflow
fish: Job 1, 'cargo run' terminated by signal SIGABRT (Abort)
I guess there isn't any data for the discriminant to go on, so it choses the first one. Upon which it needs to construct a Succ
, so it needs to construct a Nat
. And there's the infinite recursion.
This issue is similar to #30.
I have no practical use for this type, I just encountered it while helping someone on a Matrix channel. I know the issue can be side-stepped by reordering the variants.
This is caused by the behaviour of Unstructured::fill_buffer
. The docs say this:
If this Unstructured does not have enough underlying data to fill the whole buffer, it pads the buffer out with zeros.
This means that, for example, <u32 as Arbitrary>::arbitrary()
will always return 0
if the Unstructured
is exhausted. If an Arbitrary
implementation uses an integer as a discriminant, and the 0
case results in recursion, this recursion will be unbounded if the Unstructured
is exhausted.
fill_buffer()
should probably just fail if it can't actually produce meaningful data.
The derive should emit code that does something like
if u.is_empty() {
// generate the smallest variant of the enum according to `size_hint` or something
}
Let's get a little bit more pathological…
#[derive(Debug, Arbitrary)]
enum Nah {
Foo(Box<Nah>, Box<Nah>, Box<Nah>),
Bar([u128; 8], Vec<u8>),
}
- @fitzgen How do you plan on invoking
size_hint
for an enum variant? It can only be invoked for a type - Even if you could, the size might not help, as the recursing variant is not necessarily the biggest
- I thought you could get around this by not returning zeros when
Unstructured
has run out of data, but an arbitrary but predecided stream of random bytes, but alas, the following overflows:
use arbitrary::{Arbitrary, Unstructured};
use rand::prelude::*;
use rand_chacha::ChaCha8Rng;
fn main() {
let mut rng = ChaCha8Rng::seed_from_u64(2);
let mut vec = vec![0; 10_000_000];
rng.fill(&mut vec[..]);
println!("{:#?}", Nah::arbitrary(&mut Unstructured::new(&vec)).ok());
}
[Edit:] Had an idea here. Noticed it won't work. Deleted that part of the comment.
- I thought you could get around this by not returning zeros when
Unstructured
has run out of data, but an arbitrary but predecided stream of random bytes, but alas, the following overflows:fn main() { let mut rng = ChaCha8Rng::seed_from_u64(42); let mut vec = vec![0; 10_000_000]; rng.fill(&mut vec[..]); println!("{:#?}", Nah::arbitrary(&mut Unstructured::new(&vec)).ok()); }
Works fine here:
Finished dev [unoptimized + debuginfo] target(s) in 0.31s
Running `target/debug/play`
Some(
Foo(
Bar(
[
145476293461542000028807775544439050632,
98203408273056885871405466205373753170,
104820770727627121188480330252351799966,
262442360616484273216703569333952063620,
172477858353127110140320870355270578584,
325646913827089911432257323056979532707,
247997273994881466872382775152887726925,
195796141835375213508570553128308946852,
],
[
134,
],
),
Bar(
[
189230019281459675866692042116288544555,
10302462340851169927061008405705438995,
24434008895600876847358622365140990290,
8955178589941525854220301430770238484,
140787067354498870980068053962334499139,
212084145681883139936596538251040369090,
190380976950187836915899586137452386222,
39701820008063387079632896692182521625,
],
[],
),
Foo(
Bar(
[
144393945012608794761271592009849156610,
299966469165717547914851948579865824505,
262657219299653677302421682523772390291,
14395053008657570419420608430383144625,
318878744917687409621213322278755985101,
154148573594360293034161480903201738771,
198652894574571886794678961254589054462,
78547312457769649313774646773255028122,
],
[],
),
Bar(
[
139244656239713257046355550825827239774,
45174505172754008672682525664790785045,
5192048952276549987242251111937293972,
13545657949210434876638699673470301078,
233059012670395942644901893133911240919,
176464204661326593671800602846937621907,
209910084961443872669042730406692582941,
28947790981662737956959571844291394338,
],
[],
),
Bar(
[
293542515775296303396448842148348253683,
188633011601097512211155749059341157545,
176558060690305331944736596256194049543,
312132873124237769469972708238692011292,
218211164886327969114991078274546456891,
176747798327404585144315594476069336114,
258473544635761988794751479107835517079,
220471918399425442755240862426411403078,
],
[],
),
),
),
)
The only case in which I can plausibly see random data overflowing arbitrary()
is when all enum variants are unbounded recursive, and so no matter what you throw at it, it will always recurse.
Apologies, I gave the wrong seed. Try 2. (Or just looping through a few of them.)