Higher level refactoring w/ many files
Closed this issue · 10 comments
I'm realizing w/ gpt refactoring can happen at a whole new level now? I guess we call it semantic refactoring? The only problem is token limits?
Are there things we could do via an lsp client (language server protocol) which would help w/ token limitations? While allowing larger scale refactoring? ie lsp-mode in emacs might be a good starting point?
Would it make sense to enable discussions on this repo? For dumb questions like this? ;)
Given gpt cannot yet handle unified diffs, perhaps simple search/replace strings could suffice in the meantime? For automated editing/patching of existing codebases?
Again, this project rocks !!! Love it!
Something I’ve been playing with: summaries.
For example:
- “Create a function that does ____. Provide a summary of inputs and expected outputs”
- Load the function summary when making the next next request: “Create a component that does _______. Available functions in this project: (list of current function summaries).”
Refactors can have a first step where each function/class/pattern gets summarized and documented, then iterated through where GPT refactors each within the token window.
Yeah, automated code refactoring is definitely something you can hack together with ShellGPT and something I'm also very interested in.
ChatGPT does not seem natively very good at refactoring. It often omits details with a comment, and doesn't tend to apply deeper changes, at least in my naive tests. So, this requires more cleverness and specificity in prompting.
You'd then want to augment with some sort of smart retrieval mechanism for token limits, e.g. vector embeddings or on-demand file loading (running cd
, ls
, cat,
etc.). I've written some things on top of ShellGPT for this, which I might publish eventually, but it's not too complicated to bootstrap yourself.
Ideally you'd also have some typechecking or compilation feedback loop, like https://github.com/biobootloader/wolverine, automatically repairing errors that arise. The LSP idea is great too.
(Will leave this as an issue for now as easier to keep track of than separate discussions tab)
In playing around w/ the code interpreter for chat gpt+, when creating serde's for length delimited strings, I've watched gpt be smart enough to realize it can check itself by round-tripping the serde w/o me touching a single key. Via multiple trial/error runs, on it's own, gpt4 was able to figure out the correct code using the python interpreter. My jaw hit the floor as I watched, lol. I'm thinking we could do the same thing here w/ lsp's and various language compilers? All in a docker container?
Some things I'm unsure of, for the above automated loop, is it better to use system prompts or function calls? For the automated gpt4 driven trial/error editing/testing?
As the complexity grows in a codebase, confidence can be increased on large refactors by having gpt4 write lots of tests along the way?
Per the summaries idea mentioned above, I wonder if emacs/etags filtered via grep using symbols of the current refactor target could be another way?
There is also a server-mode for emacs, would there be any value in having gpt4 "drive it" ?
Please forgive all the rambling, lol, I have a very active mind ;)
Not saying I've never been frustrated w/ gpt4, but the more I use it, starts to feel like skiing w/ someone. A sort of "flow" starts to happen.
Let's ski this AI wave guys ;)
@mattvr I'm a dummy on vector embeddings, with respect to coding what's the best way for me to understand vector embeddings better? Also, would the gpt chat "fine tuning" coming in the fall be applicable here as well?
@mattvr I "think" gpt "function calls" can help fix the "deeper changes" problem?
This could also be used as a way to do search/replace/append editing of files from within the -x switch? Or something similar?
Sorry for missing some of the comments here. I found https://github.com/paul-gauthier/aider recently which looks to help with refactoring files. Could be a good starting point!