This project uses the language model Codex for program synthesis.
The idea is to generate programs in an incremental fashion. Each step is described in a spec file.
The Codex model uses the spec files as a prompt and generates a git patch for each spec. With git-apply the patch is then applied to files.
A similar idea is described in the paper Evolution through Large Models.
Running git-diff with the -p option produces patch text. The same format is generated by the language model. It may be useful to know the patch text format.
The patch text format is preceded with a git diff header that looks like this:
diff --git a/file1 b/file2The a/ and b/ filenames are the same unless rename/copy is involved.
Git diff compares two files line by line, finds groups of lines that differ, and reports each group of differing lines. Groups of differing lines are called hunks.
Example:
git diff old.py new.py > file.diffold.py
a
b
c
d
enew.py
a
b
c
d
e
ffile.diff
diff --git a/old.py b/new.py
index 9405325..0fdf397 100644
--- a/old.py
+++ b/new.py
@@ -3,3 +3,4 @@ b
c
d
e
+f-3,4: Hunk of old file starts at line 3 (line with c) and has a length of 3 lines (lines with c, d, e).
+3,5: Hunk of new file starts at line 3 (line with c) and has a length of 4 lines(lines with c, d, e, f).
For running the application access to the Codex model is required.
python main_generate_patches.pyThe model is capable of generating valid patches for simple specifications. If the specification is more complex, the model will fail to generate a valid patch because it wasn't trained on git commits.