Tune the generator by randomly selecting sub-languages to explore

Question

Tune the generator by randomly selecting sub-languages to explore

Opened this issue 6 years ago · 0 comments

This is a tracking issue for the idea in comment #2 (comment):

The idea of "commenting out all non-flat definitions" corresponds to a common idea in the QuickCheck world, which is that when you have N combinators, it can be more efficient to randomly select a smaller group of K<N combinators and test with only that (for example, "let's only use float operations"). Eventually we should try to get this logic into the program generator, by selecting a (coherent) subset of the initial environment instead of always using the full environment. One easy way to do this would be to define separate environments for separate "features" that we want to test (int, string, float, lists, etc.), and compose the initial environment as a union/concatenation of a randomly-selected subset of feature-specific environments.

More generally, the idea is that we could define "sub-languages" that our generator can target (ints and int operations, float and float operations, string and string operations, list and list operations, etc.), and that bug-finding is probably going to be more effective if it sometimes works only in one sub-language or in a small numbers of combined sub-languages. Otherwise, as the set of features we cover grows, it is very unlikely that, for example, several floating-point operations are going to be combined in interesting ways, there will always be operations at other types in the middle.

Random (no pun intended) thoughts:

A sub-language is defined by slightly more than just the initial typing environment: the choice of sub-language may influence the type generation.
When combining sub-languages, one must ensure that the operations to bridge between them exists. For example, string_of_int must either belong to the int sub-language, or to the string sub-language, or be added when the two are enabled at the same time. (I think it's fine if sub-languages introduce operators that use types that are not in the same sub-language; they will just never get picked by the type-aware generator unless the other is installed. So in that example I would include string_of_int in the string feature set.)
It may be interesting to select a sub-language dynamically as part of the generation logic; for example, a program could contain a pair, the first element being in the float+int sub-languages and the second being in string+bool. I don't know how to combine this with the current two-phase process of first generating a type for the whole expression, and then generating terms for it, so maybe this idea will never see the light.