Test Execution Enviroment for SweBench tasks
345ishaan opened this issue · 3 comments
I have recently being working on swebench where we built distributed eval on top of Modal for faster eval cycles. As a next step, I was hoping to use that setup to execute the patch generated by LLMs after the localization stage. I was wondering whether it is possible via the commit0 project.
Test execution feedback and search can improve the quality over Best-of-N or majority voting based approaches. Also, as part of this idea, we either need to predict the relevant unittests which affect the localized files or generate unittests using LLMs.
Thank you for your interests! Our current setup does something similar where we copy the patch to modal and run eval there. However, it is not possible to directly use our code on swebench. Some code modifications are needed. If you already have the plan I'm happy to help out, but supporting commit0 on swebench is not our priority at the moment, and I am not sure when we will get to that.
Update: the integration is in process. We will have a release next week.
@wenting-zhao Thanks a lot. looking forward to try it out and contribute.