[Discussion] Is the Top-Down pipeline of the Agentless inherently limited ?

Question

[Discussion] Is the Top-Down pipeline of the Agentless inherently limited ?

Opened this issue 5 months ago · 1 comments

Hi, I am really impressed by the Agentless work. I think it is novel, intuitive and clean. However, when I dig in the code and pipeline, I found a few limitations in the current pipeline.

Compare to all the other Agentic methods, Agentless did not use specific tools or interactive pipelines. Instead Agentless use a hierarchical method to gradually locate file , then function/class/vars and then lines to collect fault context. Once it has all the context, then the model can start to generate the fix patch.

However, I believe in each step of the fault localization, the information is incomplete/limited to support its localization.

For file localization, the input is the problem description, and the repo structure rendered in a directory tree format. The model is expected to output potential files to modify. In my opinion, the information is very limited since the file names can be uninformative.
For function/class/var localization, currently the only supported option is to provide problem description and compressed skeleton code. Without knowing the actual implementations in each class/function, how could the model know which function/class to modify if only names are provided?
Now that classes/functions/vars are collected, some context windows will be constructed and feed into model for patch generation. However, what if the patch requires adding import, helper functions/ global variables. These context will not be retrieved from step 2 and models are constrained to generate patch based on retrieved contexts ...

I think maybe toosl are something ugly but we have to use ? The top-down static method that Agentless currently using seems limited. I have been thinking hard on this, but I think to collect enough repo context, static pipeline seems very infeasible at least to me..

Wonder what you guys think of this .. ??

Answer 1 · 2024-08-10T03:39:43.000Z

Hi @Randolph-zeng so sorry for the late reply, we have been busy with some recent deadlines.

Thanks for your question and discussions

In my opinion, the information is very limited since the file names can be uninformative ... Without knowing the actual implementations in each class/function, how could the model know which function/class to modify if only names are provided

This is definitely true, and is something we are trying to improve over by using source code information more

These context will not be retrieved from step 2 and models are constrained to generate patch based on retrieved contexts

Totally, agreed if it requires adding additional imports it might be difficult for the current setup to handle.

I think maybe toosl are something ugly but we have to use ?

In this work we just want to show that a simplistic agentless pipeline can be useful as well. This is not to say that in the future we should not using agents (I also agree that using agents in the future is the way to go). The process of getting to that step needs to be grounded and we should design methods and approaches which make sense and can test that each components of an agent-based approaches can help.

Thanks for these questions and discussions points.