/synthesis_comparison

We compare a bunch of synthesis techniques regarding their domain in general and applicability regarding program synthesis specifically

Synthesis

Synthesis is complicated. Solving ILP and SMT queries is NP-complete. Unbounded cases (as are necessary for more complicated logics or when involving loops/recursion) are undecidable.

Synthesis can be seen as a search problem in an enormous search space with (usually) bad heuristics.

However, there are many tools working surprisingly well for many cases. Currently, synthesis remains an open problem. Most tools only generate tiny programs/fragments with the desired properties.

We take a look at the current state of tooling with their domain of operation, results and guarantees on these results, description of the technique, requirements, implementations, and maximal instance size on example problems.

We inspect the following techniques:

  • Reinforcement Learning
    • with LLM
    • MCTS
  • AI Planning
  • Genetic Programming (evolutionary algorithms)
    • Genetic Neural Networks
  • Generative Networks
    • Large Language Models
  • Constraint Programming
    • Constraint Satisfaction Networks
    • WFC / Model Synthesis
    • generalized Integer Linear Programming
    • Declarative/Inductive Programming
    • Linear Temporal Logic (Reactive synthesis)
  • SMT
    • Angelic Execution
    • Hole Filling
    • Inductive programming

Techniques

Many techniques are already discussed (at their back-then state) by Gulwani¹. We will focus on the special task of low-level complete program synthesis, the new techniques, and give a short summary/table to compare the techniques especially with respect to the requirements and guarantees.

Summary

Technique Domain Requirements Result Guarantees Tools Examples (Applications)
Reinforcement Learning Observations (Input) and Actions (Output) Score function, non-standard encoding of problem Action-plan(s) None, but optimizes for score (can encode speed-requirement) Gym, Torch AlphaDev, AlphaTensor
AI Planning Facts about states and actions Encoding as facts Action Sequence Correctness, optionally Optimality (if exists) PDDL-solver
Genetic Programming Genes Fitness function Genes None, but good fitness PyGad
Generative NN Text queries pre-trained model program code None LLMs GPT, CoPilot
Constraint Programming restrictive Constraint specific encoding Solution satisfy all constraints Prolog, Z3
SMT FOL encoding Solution Correctness Z3, CVC5

Experiments

https://docs.google.com/spreadsheets/d/18pV_AFn8EqWnFsXg6lTkTJtPIPQgOTxlhqvPVEM_P6M/edit?usp=sharing

For MetaLift (Rosette), we converted the imperative code in a functional way by changing a world state with each instruction.

Reinforcement Learning

Encodes the problems as environment and actor. The actor is learned and performs actions that change the environment.

In the setting of synthesis, the actions are the synthesized program and the environment is data aiding the generation.

To decouple the actor from the synthesized program, the environment state has to be universal and independent from concrete instances.

To guide the search, scores (rewards) can be provided after each action (or just at the end). The score should reward a correct (and fast) solution. The more informative the score is (points for partial solution, intermediate feedback), the better the search works. However, a too detailed search function might lead the agent to a pre-defined path. Additionally, domain-knowledge and expert-knowledge is needed to craft suited score functions.

The score allows to optimize for arbitrary criterias. The agent can be modelled arbitrarily complex (with networks, LLM, MCTS, ...). There are many hyperparameter tunable for training.

See Reinforcement Learning for Assembly Synthesis for examples and more details.

Reinforcement learning can be seen as quite similar to genetic programming. The score function is similar to the fitness of genetic algorithms. The updating and learning however is quite different. Genetic algorithms learn in a less focused way but are more generally applicable. See https://medium.com/xrpractices/reinforcement-learning-vs-genetic-algorithm-ai-for-simulations-f1f484969c56 for a comparison.

AI Planning

In planning, the problem is defined by actions and an environment. The environment consists of facts that are true. Actions depend on these facts and can change them.

The problem is a search in a given state for a goal state. One does not need to define custom heuristics.

Like Reinforcement learning, an action sequence is learned for a given state (in an environment). Unlike RL, the model knows the environment and action semantics completely. Hence, planning is more restricted in the choice of environment.

You usually define the problem in PDDL.

Sort Experiment: Not applicable. The commands needs to be (encoded in) the actions. The state needs to encode a superposition/unknown register entry (otherwise, we would sort concrete elements). Actions need to be deterministic depending on facts. Together, we can not conditionally transform the state set without fully exponentially unrolling the permutations of the progra state and actions.

Genetic Programming

In genetic programming, you define a set of genes and ways to operate on them. The fitness of individuals is computes, and the most fit are selected, cross breeded, and mutated.

Gentic algorithms are widely applicable. You can use weights of networks as genes, use genes and simulate complex behaviour on top of them, or directly operate on genes for the fitness function.

GAs can solve search and optimization problems.

In combination with neural networks, GA can be applied to learn the weights as well as the hyperparameters. The topic of neuroevolution including the NEAT algorithm focuses on GA for hyperparameters. It can also be combined with reinforcement learning.

Generative Networks

These techniques work quite well. Language models, usually based on natural language, generate code.

A prime example is the recent development in large language models.

The query is usually given as completion problem or using natural language queries. Using prompting techniques, the accuracy can be increased and certain errors (e.g. syntax errors) can be mostly avoided.

However, there are no guarantees about the resulting code. It might be slow or even wrong.

Constraint Programming

This general field (which also subsumes SMT) is concerned with solvers that given a domain and constraints of a problem find a solution satisfying all constraints.

These methods can encode arbitrarily complex problems spanning a multitude of complexity classes (NP, PSpace, 2ExpTime, undecidable).

The theorical principles behind the solvers are often similar as outlined with the reductions. However, certain problems are easier encoded in some domains than in others. Additionally, the formulation and encoding can greatly influence how feasible the synthesis becomes.

These techniques are not developed for program synthesis and often require ingenuity and tedious rewrites to be applied.

SMT

SMT is a generalization of SAT (formula satisfiability) with theories. Complex semantics (especially loop-free fragments) can be encoded in SMT. Loops are non-straightforward and usually require approximations.

Subcategories include synthesis where concrete instantiations (e.g. of program code fragments) are generated that do not violate assertions.

  • Suslik (Separation Logic) -- not applicable
  • Synquid (Refinement Types) -- not imperative
  • ReSyn (Resource Types) -- not applicable
  • Dafny (Hoare) -- no synthesis; focus on verification
  • EUSolver (SyGUS) -- functional
  • Euphony (needs training, SyGUS)
  • BlinkFill (semi-supervised; FlashFill + structure from inputs)
  • Brahma (loop-free, counterexample-guided refinement)
  • Sketch
  • Regae (Test generation, regex)
  • Stoke (heuristic search)
  • PROSE (sketch, input-outpu)
  • Rosette

Related Repositories

Notes:

Papers:

  • ¹Program Synthesis, Gulwani et. al, 2017