YangLing0818/buffer-of-thought-llm

Thought Extractor based on coding repositories

jonathanpv opened this issue · 2 comments

There should be an easy to to ingest a code repository and git PRs

Git PRs are prime spaces to implement "thought extractors" or "solution extractors"

For a example a prompt like:

"Given the below git diff, generate a thought process that I will store in a bank of solutions. The thought processes you generate must cover all basis a software engineer must have gone through to generate the PR.

A thought is basically a solution eg
"where is the place that we edit the images directory? and how did we solve it

to solve this question we need to use the
function here_that_solves_the_issuse():

<further explanation of the solution and why it logically makes sense, this should be grounded on the
truth from the PR ONLY use CODE that is FROM the PR so we have 100% accuracy>
"

Here's the context for this PR:

And here's the diff:

Respond in the following manner: here's examples of thoughts / solutions / pattern of solutions.

To add a method that .... we need to add ...
func some_thing_here()

Respond and output all thoughts you can, I will continue the conversation and ask for more"

ref: #4

This paired with a code runner could work to create a high quality dataset for any repository, ALTHOUGH this may be unnecessary since, if a PR is merged then we assume it compiled already.

A further step could be introduced where we REMOVE portions of logic FROM the PR, introducing FLAWS on PURPOSE and extract error logs from the compiler, and since we know what the mistake was or portions we removed we already know the solution as well...

Mapping of
flaw / error -> solution can also be created and stored in buffer of thoughts

meaning we can have the thought processes the engineer faced during actual code writing. Meaning we time travel to history, based on the merged code. Time travel and discover the THOUGHTS of the engineer who authored the PR.

Github PRs are rich and knowledge dense, we can expand them using agents and then have that data / buffer of thoughts in your agent framework

Which should be AGI, faster than devin, basically a billion dollar product. And replace many software engineers.

@YangLing0818 would love your thoughts