turboderp/exui

Out of memory

AICodingTester opened this issue · 16 comments

I have an issue where before in an older version first the GPU VRAM fills up and then the system RAM after that. Since I updated both fills up at the same time causing this error: RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

There isn't a lot of stack trace there. Do you have some more?

Also it would help to know which version didn't have this issue. Normally it's not supposed to fill up system RAM at all, but recent NVIDIA drivers have a somewhat unreliable system memory swap feature that might have been doing that? Really hard to debug without more information.

I don't have much more information to provide, as when it loads and then fails, I only receive this error. I haven't used exui in a few months, but I recently updated everything, including every dependency. I'm also on the Nvidia driver version 555.41, if that helps. I've experimented a bit with the system memory fallback in the control panel, but sadly to no avail.

Don't you get more text with the error message than that? There's usually a stack trace to show which function raised the exception.

The latest Windows update seems to have resolved the out-of-memory issue I was experiencing. However, it has also significantly slowed down the inference performance.

Before the update:

Inference speed: 15-20 tokens per second (t/s)
Out-of-memory issue: Present

After the update:

Inference speed: 1-2 tokens per second (t/s)
Out-of-memory issue: Resolved
The inference speed has decreased to approximately one-tenth of the previous performance, dropping from 15-20 t/s to just 1-2 t/s.

Unfortunately, I don't have a stack trace available that I could provide for further analysis of the performance issue. The slowdown seems to be a direct consequence of the recent Windows update, as no other changes were made to the system configuration.

For reasons I can't pinpoint, the server.py file now only opens a blank command line after restarting my PC again. Despite reinstalling Python, Torch, and Exllama, the issue persists. Interestingly, Comfyui starts up without a hitch, but the server.py within Exui is problematic and remains inactive.

I've been trying to troubleshoot the server.py issue by following various steps, such as creating a new virtual environment, installing the required dependencies, and setting the CUDA_HOME environment variable. However, despite these efforts, the issue persists.
I've also used the python -m trace --trace command to trace the execution of server.py, and it revealed a loop in the file_baton.py file related to checking the existence of a lock file. The loop seems to continue indefinitely, preventing the server from starting properly. : --- modulename: genericpath, funcname: exists
genericpath.py(18): try:
genericpath.py(19): os.stat(path)
genericpath.py(22): return True
file_baton.py(42): time.sleep(self.wait_seconds)
file_baton.py(41): while os.path.exists(self.lock_file_path):
--- modulename: genericpath, funcname: exists
genericpath.py(18): try:
genericpath.py(19): os.stat(path)
genericpath.py(22): return True
file_baton.py(42): time.sleep(self.wait_seconds)
file_baton.py(41): while os.path.exists(self.lock_file_path):
--- modulename: genericpath, funcname: exists
genericpath.py(18): try:
genericpath.py(19): os.stat(path)
genericpath.py(22): return True
file_baton.py(42): time.sleep(self.wait_seconds)
file_baton.py(41): while os.path.exists(self.lock_file_path):
--- modulename: genericpath, funcname: exists
genericpath.py(18): try:
genericpath.py(19): os.stat(path)
genericpath.py(22): return True
file_baton.py(42): time.sleep(self.wait_seconds)
file_baton.py(41): while os.path.exists(self.lock_file_path):
--- modulename: genericpath, funcname: exists
genericpath.py(18): try:
genericpath.py(19): os.stat(path)
genericpath.py(22): return True
file_baton.py(42): time.sleep(self.wait_seconds)
file_baton.py(41): while os.path.exists(self.lock_file_path):
--- modulename: genericpath, funcname: exists
genericpath.py(18): try:
genericpath.py(19): os.stat(path)
genericpath.py(22): return True
file_baton.py(42): time.sleep(self.wait_seconds)
file_baton.py(41): while os.path.exists(self.lock_file_path):
--- modulename: genericpath, funcname: exists
genericpath.py(18): try:
genericpath.py(19): os.stat(path)
genericpath.py(22): return True
file_baton.py(42): time.sleep(self.wait_seconds)
file_baton.py(41): while os.path.exists(self.lock_file_path):
--- modulename: genericpath, funcname: exists
genericpath.py(18): try:
genericpath.py(19): os.stat(path)
genericpath.py(22): return True
file_baton.py(42): time.sleep(self.wait_seconds)
file_baton.py(41): while os.path.exists(self.lock_file_path):
--- modulename: genericpath, funcname: exists
genericpath.py(18): try:
genericpath.py(19): os.stat(path)
genericpath.py(22): return True
file_baton.py(42): time.sleep(self.wait_seconds)
file_baton.py(41): while os.path.exists(self.lock_file_path):
--- modulename: genericpath, funcname: exists
genericpath.py(18): try:
genericpath.py(19): os.stat(path)
genericpath.py(22): return True
file_baton.py(42): time.sleep(self.wait_seconds)
file_baton.py(41): while os.path.exists(self.lock_file_path):
--- modulename: genericpath, funcname: exists
genericpath.py(18): try:
genericpath.py(19): os.stat(path)
genericpath.py(22): return True
file_baton.py(42): time.sleep(self.wait_seconds)
file_baton.py(41): while os.path.exists(self.lock_file_path):
--- modulename: genericpath, funcname: exists
genericpath.py(18): try:
genericpath.py(19): os.stat(path)
genericpath.py(22): return True
file_baton.py(42): time.sleep(self.wait_seconds)
file_baton.py(41): while os.path.exists(self.lock_file_path):
--- modulename: genericpath, funcname: exists
genericpath.py(18): try:
genericpath.py(19): os.stat(path)
genericpath.py(22): return True
file_baton.py(42): time.sleep(self.wait_seconds)

As far as I can tell this is an issue with the PyTorch extension build system. Best advice I can give is to clear out your ~/.cache/torch_extensions directory, delete any ./build/ folder in ExLlama's repo directory and try again.

There's one more thing that's a bit of a long shot, but I have had to do it when profiling in nsys, since it seems to get confused as to what venv to use: uninstall the ninja package with pip and install ninja system-wide instead with your OS package manager.

If none of this works there's also the prebuilt wheels of course.

Thanks for the suggestions! I've tried clearing out the ~/.cache/torch_extensions directory, but it turns out I don't have that folder on my system. I wasn't sure if I should create it manually, but I figured PyTorch would probably create it automatically if needed.
I went ahead and deleted the ./build/ folder in the ExLlama's repo directory, just to make sure there weren't any lingering files causing issues.
I also uninstalled the ninja package using pip and then installed it system-wide using the package manager for my OS, as you suggested. I thought this might help resolve any confusion with the virtual environment.
After all that, I tried running my script again, but unfortunately, I'm still seeing the same error message related to the PyTorch extension build system.
At this point, I'm kind of at a loss and not sure what else to try. If you have any other ideas or suggestions, I'm all ears! I'm wondering if there might be something specific to my setup or environment that's causing this issue.
Let me know if you need any additional information or if there are any other troubleshooting steps I can try.

The file_baton stuff is a mechanism used by Torch during the extension build process. See here. It shouldn't have anything to wait for if there isn't a ~/.cache/torch_extensions folder at all, so it's very strange.

Could you list the output of:

pip show torch ninja exllamav2
nvcc --version
gcc --version

C:\Users\timoe\exui>pip show torch ninja exllamav2
Name: torch
Version: 2.3.0
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: c:\users\timoe\appdata\local\programs\python\python310\lib\site-packages
Requires: filelock, fsspec, jinja2, mkl, networkx, sympy, typing-extensions
Required-by: exllamav2

Name: ninja
Version: 1.11.1.1
Summary: Ninja is a small build system with a focus on speed
Home-page: http://ninja-build.org/
Author: Jean-Christophe Fillion-Robin
Author-email: scikit-build@googlegroups.com
License: Apache 2.0
Location: c:\users\timoe\appdata\local\programs\python\python310\lib\site-packages
Requires:
Required-by: exllamav2

Name: exllamav2
Version: 0.0.19
Summary:
Home-page: https://github.com/turboderp/exllamav2
Author: turboderp
Author-email:
License: MIT
Location: c:\users\timoe\appdata\local\programs\python\python310\lib\site-packages
Requires: fastparquet, ninja, numpy, pandas, pygments, regex, safetensors, sentencepiece, torch, websockets
Required-by:

C:\Users\timoe\exui>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:28:36_Pacific_Standard_Time_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0

C:\Users\timoe\exui>gcc --version
Der Befehl "gcc" ist entweder falsch geschrieben oder
konnte nicht gefunden werden.

I had somehow assumed you were on Linux. In that case you want to look for the torch_extensions folder in C:\Users\timoe\AppData\Local\torch_extensions. If it's there, try deleting it and see if it helps.

The exllamav2 version you have installed there is the JIT version, just to be clear. It shouldn't ever use the Torch extension build system if you use a prebuilt wheel instead. Have you tried this:

pip uninstall exllamav2
pip install https://github.com/turboderp/exllamav2/releases/download/v0.0.19/exllamav2-0.0.19+cu121-cp310-cp310-win_amd64.whl

I've followed these steps and now get this error: C:\Users\timoe\exui>python server.py
Traceback (most recent call last):
File "C:\Users\timoe\exui\server.py", line 11, in
from backend.models import update_model, load_models, get_model_info, list_models, remove_model, load_model, unload_model, get_loaded_model
File "C:\Users\timoe\exui\backend\models.py", line 5, in
from exllamav2 import(
File "C:\Users\timoe\AppData\Local\Programs\Python\Python310\lib\site-packages\exllamav2_init_.py", line 3, in
from exllamav2.model import ExLlamaV2
File "C:\Users\timoe\AppData\Local\Programs\Python\Python310\lib\site-packages\exllamav2\model.py", line 23, in
from exllamav2.config import ExLlamaV2Config
File "C:\Users\timoe\AppData\Local\Programs\Python\Python310\lib\site-packages\exllamav2\config.py", line 3, in
from exllamav2.fasttensors import STFile
File "C:\Users\timoe\AppData\Local\Programs\Python\Python310\lib\site-packages\exllamav2\fasttensors.py", line 6, in
from exllamav2.ext import exllamav2_ext as ext_c
File "C:\Users\timoe\AppData\Local\Programs\Python\Python310\lib\site-packages\exllamav2\ext.py", line 19, in
import exllamav2_ext
ImportError: DLL load failed while importing exllamav2_ext: Das angegebene Modul wurde nicht gefunden.

PyTorch 2.3 was released yesterday. I haven't tried it yet myself, but I do know that the Torch people like to completely break backwards compatibility with every new release. I'll be releasing 0.0.20 soon with prebuilt wheels compiled against Torch 2.3.0 (if that works, hard to predict) but in the meantime you'd probably have more luck downgrading to PyTorch 2.2.0.

i've downgraded to PyTorch 2.2.0 but the same error somehow persists:

Installing collected packages: torch
Successfully installed torch-2.2.0

C:\Users\timoe\exui>python server.py
Traceback (most recent call last):
File "C:\Users\timoe\exui\server.py", line 11, in
from backend.models import update_model, load_models, get_model_info, list_models, remove_model, load_model, unload_model, get_loaded_model
File "C:\Users\timoe\exui\backend\models.py", line 5, in
from exllamav2 import(
File "C:\Users\timoe\AppData\Local\Programs\Python\Python310\lib\site-packages\exllamav2_init_.py", line 3, in
from exllamav2.model import ExLlamaV2
File "C:\Users\timoe\AppData\Local\Programs\Python\Python310\lib\site-packages\exllamav2\model.py", line 23, in
from exllamav2.config import ExLlamaV2Config
File "C:\Users\timoe\AppData\Local\Programs\Python\Python310\lib\site-packages\exllamav2\config.py", line 3, in
from exllamav2.fasttensors import STFile
File "C:\Users\timoe\AppData\Local\Programs\Python\Python310\lib\site-packages\exllamav2\fasttensors.py", line 6, in
from exllamav2.ext import exllamav2_ext as ext_c
File "C:\Users\timoe\AppData\Local\Programs\Python\Python310\lib\site-packages\exllamav2\ext.py", line 19, in
import exllamav2_ext
ImportError: DLL load failed while importing exllamav2_ext: Das angegebene Modul wurde nicht gefunden.

C:\Users\timoe\exui>

There's something really weird about your setup, I think. The failure happens here:

build_jit = False
try:
    import exllamav2_ext
except ModuleNotFoundError:
    build_jit = True

You're somehow getting an ImportError complaining about the module not being found, instead of the ModuleNotFoundError that's normally raised if a module isn't found.

I think the implication is that the prebuilt extension has been installed, but there's some conflict, perhaps with another library called exllamav2 or exllamav2_ext in your path. Maybe the sane thing to do would be start again with a new venv?

python -m venv test_env
test_env\Scripts\activate
pip install https://download.pytorch.org/whl/cu121/torch-2.2.0%2Bcu121-cp310-cp310-win_amd64.whl
pip install https://github.com/turboderp/exllamav2/releases/download/v0.0.19/exllamav2-0.0.19+cu121-cp310-cp310-win_amd64.whl
# maybe extra pip packages here as needed
python server.py

I attempted the solution and it seems to have worked. I've tried reinstalling everything in my main path, but apparently, there may have been some issues.