princeton-nlp/SWE-bench
[ICLR 2024] SWE-bench: Can Language Models Resolve Real-world Github Issues?
PythonMIT
Issues
- 5
- 0
Evaluation hangs on "Building environment images"
#245 opened by thetonywu - 0
- 2
Jinja2 is not pinned for sphinx
#241 opened by SmartManoj - 1
Want apptainer support
#242 opened by HotineXie - 3
Errors building Matplotlib env instances
#239 opened by martinbel - 4
- 4
Want a docker image tar file
#211 opened by WentaoTan - 3
Issues with data collection - supported?
#188 opened by kwanUm - 1
astropy__astropy-14182 - `test_patch` includes Unnecessary `read` Method Modifications
#222 opened by SmartManoj - 5
docker evaluation gets stuck
#157 opened by crhf - 1
Excessive memory usage in conda 23.11.0
#231 opened by beornf - 1
Questions about RAG baselines
#230 opened by dlibk - 1
Dependencies version in constants.py
#229 opened by SZU-ZJW - 1
Running Evaluation on Runpod.
#220 opened by saadan1234 - 1
- 3
- 2
Clarification on Identifying fail2pass and pass2pass Test Cases for an Instance
#195 opened by gnohgnailoug - 3
- 3
failure to build env image for astropy__astropy-7606
#224 opened by kjslag - 8
Yanked package `types-pkg-resources` causes failures when evaluating on `sqlfluff`
#199 opened by klieret - 1
matplotlib__matplotlib-23476 failed at pre-install
#210 opened by HejiaZ2023 - 2
scikit-learn__scikit-learn images built error
#218 opened by JiyangZhang - 1
ValueError: Could not find requirements.txt at paths ['tests/requirements/py3.txt'] for repo django/django
#198 opened by hyyp1 - 2
- 2
- 7
Test for human falsehoods
#208 opened by MovGP0 - 3
How to inference without docker?
#194 opened by WentaoTan - 1
404 for tutorial links in the README
#209 opened by fhfonsecaa - 0
`UnicodeDecodeError` when running gold patch for `django__django-14011` in the dockerized harness
#215 opened by blahblahasdf - 1
- 4
base_commit & patch how to use?
#201 opened by xlisp - 0
image build fail in issue pallets__flask-4045
#196 opened by tangken333 - 2
- 1
Benchmarks and leaderboards published on your website are out of date. Please update them.
#191 opened by Emasoft - 6
Failing benchmark instances
#167 opened by aorwall - 4
- 1
- 1
Why SWE-Bench Train does not contains data of "test_patch"? I could not find them.
#182 opened by BoxiYu - 2
Where can I find training set to train swe-llama?
#180 opened by Hodge931 - 2
- 0
- 2
Confused about the usage of fields `test_patch`, `PASS_TO_PASS` and `FAIL_TO_PASS`
#174 opened by DavdGao - 3
Which Python version to use?
#156 opened by anupamme - 1
- 0
- 1
Missing `validation.ipynb`?
#163 opened by xingyaoww - 0
Passed test case count as failure?
#165 opened by xingyaoww - 0
Cannot load dataset from JSON file
#150 opened by klieret - 2
swe-bench can get badly stuck in `future.result()`
#158 opened by klieret