qwopqwop200/GPTQ-for-LLaMa

no module named quant_cuda (fastest-inference-4bit branch)

joshlevy89 opened this issue · 1 comments

Issue: no module named quant_cuda
Branch: fastest-inference-4bit branch

After what seems to be proper install, I get the error above when I try "import quant" or "import quant_cuda".

As a corollary, is the llama_inference.py from the main triton branch still applicable to run when switched to this branch?

Install logs:

!python setup_cuda.py install (this is on colab T4)

running install
/usr/local/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
/usr/local/lib/python3.10/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running bdist_egg
running egg_info
writing quant_cuda.egg-info/PKG-INFO
writing dependency_links to quant_cuda.egg-info/dependency_links.txt
writing top-level names to quant_cuda.egg-info/top_level.txt
/usr/local/lib/python3.10/site-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
reading manifest file 'quant_cuda.egg-info/SOURCES.txt'
adding license file 'LICENSE.txt'
writing manifest file 'quant_cuda.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
/usr/local/lib/python3.10/site-packages/torch/utils/cpp_extension.py:388: UserWarning: The detected CUDA version (11.8) has a minor version mismatch with the version that was used to compile PyTorch (11.7). Most likely this shouldn't be a problem.
warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
/usr/local/lib/python3.10/site-packages/torch/utils/cpp_extension.py:398: UserWarning: There are no g++ version bounds defined for CUDA version 11.8
warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
creating build/bdist.linux-x86_64/egg
copying build/lib.linux-x86_64-cpython-310/quant_cuda.cpython-310-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg
creating stub loader for quant_cuda.cpython-310-x86_64-linux-gnu.so
byte-compiling build/bdist.linux-x86_64/egg/quant_cuda.py to quant_cuda.cpython-310.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying quant_cuda.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying quant_cuda.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying quant_cuda.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying quant_cuda.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
writing build/bdist.linux-x86_64/egg/EGG-INFO/native_libs.txt
zip_safe flag not set; analyzing archive contents...
pycache.quant_cuda.cpython-310: module references file
creating 'dist/quant_cuda-0.0.0-py3.10-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing quant_cuda-0.0.0-py3.10-linux-x86_64.egg
removing '/usr/local/lib/python3.10/site-packages/quant_cuda-0.0.0-py3.10-linux-x86_64.egg' (and everything under it)
creating /usr/local/lib/python3.10/site-packages/quant_cuda-0.0.0-py3.10-linux-x86_64.egg
Extracting quant_cuda-0.0.0-py3.10-linux-x86_64.egg to /usr/local/lib/python3.10/site-packages
quant-cuda 0.0.0 is already the active version in easy-install.pth

Installed /usr/local/lib/python3.10/site-packages/quant_cuda-0.0.0-py3.10-linux-x86_64.egg
Processing dependencies for quant-cuda==0.0.0
Finished processing dependencies for quant-cuda==0.0.0

@joshlevy89

I have tested the fastest-inference-4bit branch with the main/triton inference code and it works.

As for your stange issue, I would fix the cuda mismatch first in your model compile log output. Also might as well install pip install ninja Check your /usr/local/lib/python3.10/site-packages
quant-cuda directory to see if it actually pushed the correct files there. The setup may have failed.