Diff GPT

This project uses the language model Codex for program synthesis. The idea is to generate programs in an incremental fashion. Each step is described in a spec file. The Codex model uses the spec files as a prompt and generates a git patch for each spec. With git-apply the patch is then applied to files.

A similar idea is described in the paper Evolution through Large Models.

Git Patches

Running git-diff with the -p option produces patch text. The same format is generated by the language model. It may be useful to know the patch text format.

Header

The patch text format is preceded with a git diff header that looks like this:

diff --git a/file1 b/file2

The a/ and b/ filenames are the same unless rename/copy is involved.

Hunk line numbers

Git diff compares two files line by line, finds groups of lines that differ, and reports each group of differing lines. Groups of differing lines are called hunks.

Example:

git diff old.py new.py > file.diff

old.py

a
b
c
d
e

new.py

a
b
c
d
e
f

file.diff

diff --git a/old.py b/new.py
index 9405325..0fdf397 100644
--- a/old.py
+++ b/new.py
@@ -3,3 +3,4 @@ b
 c
 d
 e
+f

-3,4: Hunk of old file starts at line 3 (line with c) and has a length of 3 lines (lines with c, d, e).
+3,5: Hunk of new file starts at line 3 (line with c) and has a length of 4 lines(lines with c, d, e, f).

Diff GPT

Run

For running the application access to the Codex model is required.

python main_generate_patches.py

Conclusion

The model is capable of generating valid patches for simple specifications. If the specification is more complex, the model will fail to generate a valid patch because it wasn't trained on git commits.

saschaschramm/diff-gpt

Diff GPT

Git Patches

Header

Hunk line numbers

Diff GPT

Run

Conclusion