Controlling size for nested data structures?
mikera opened this issue · 5 comments
Love the library! It is proving very useful. Quick question....
SITUATION: I'm writing some custom generators for data structures that potentially need to generate multiple nested structures of the same type.
PROBLEM: There is a certain threshold where the sizes of the generated data structures seem to explode exponentially, probably because the expected number of children increases at each level (which in turn have a larger expected number of children themselves...)
QUESTION: Is there a good way to control the sizes of nested structures? Ideally if I am generating a data structure of size n
with m
children, it seems like it would be sensible for each child to be generated with size n/m
but it's not quite clear how to do this?
@mikera Thanks for checking out junit-quickcheck. I don't plan to develop on it any further -- you may want to investigate Jqwik by @jqwik-team.
That said, I'd be interested to check out your custom generators. Can you post some code?
Thanks for the reference, I will take a look!
Here's an example of a fairly minimal generator that blows up (stack overflow) while creating a proportion of nested instances of the same type (ArrayLists of ArrayLists and Longs):
import java.util.ArrayList;
import com.pholser.junit.quickcheck.generator.GenerationStatus;
import com.pholser.junit.quickcheck.generator.Generator;
import com.pholser.junit.quickcheck.random.SourceOfRandomness;
/**
* Generator for arbitrary trees of Longs
*/
public class BadListGen extends Generator<Object> {
public BadListGen() {
super(Object.class);
}
@Override
public Object generate(SourceOfRandomness r, GenerationStatus status) {
int size=status.size()+1;
int type = r.nextInt(10);
switch (type) {
case 0:
int n = r.nextInt(size);
ArrayList<Object> al=new ArrayList<>();
for (int i=0; i<n; i++) {
al.add(generate(r,status));
}
return al;
default:
Long l=r.nextLong(-size, size);
return l;
}
}
}
Ideally, I think the size in the nested call to generate needs to be reduced to something like size/n
on each iteration? The intention would be that the size controls the overall number of elements in the tree, rather than the size of each child
@mikera Ah yes, I see what you mean. Sadly, I didn't make accessible GenerationStatus
derivatives that would allow you to futz with size, etc. Your best bet may be to make a GenerationStatus
subclass where you can control/modulate size in the manner you wish, and you can feed to your recursive calls. There is an internal-but-maybe-still-accessible AbstractGenerationStatus
class you can use to get started, if you don't want to implement the interface directly.
@mikera What do you think about something like this?
public class BadListGen extends Generator<Object> {
public BadListGen() {
super(Object.class);
}
@Override
public Object generate(SourceOfRandomness r, GenerationStatus status) {
List<Object> root = new ArrayList<>();
for (int i = 0; i < status.size(); ++i) {
root.add(_generate(r, status.size(), 1));
}
return root;
}
private Object _generate(SourceOfRandomness r, int size, int level) {
int type = r.nextInt(10);
if (type == 0) {
int numberOfChildren = size == 0 ? 0 : r.nextInt(size);
List<Object> children = new ArrayList<>();
for (int i = 0; i < numberOfChildren; i++) {
children.add(_generate(r, size / (level + 1), level + 1));
}
return children;
}
return r.nextLong(-size, size);
}
}
Thanks @pholser for taking a look - your suggestion works well in simple cases
Only slight challenge is if the generator is parameterised and needs to generate something (e.g. leaf nodes) with a generic Generator
that doesn't have a _generate
and therefore needs a GenerationStatus
. Perhaps I can play around with some clever subclassing to solve this :-)
Perhaps an interesting research topic for someone would be exploring how test data generators can be structured to provide provable O(size)
construction time and space.