[BUG/Help] <title>第二个视频运行api.py 一直无法加载，报错如下图

Question

[BUG/Help] <title>第二个视频运行api.py 一直无法加载，报错如下图

zx0406 opened this issue 10 months ago · 0 comments

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
No compiled kernel found.
Compiling kernels : C:\Users\User.cache\huggingface\modules\transformers_modules\models--THUDM--chatglm-6b-int4\quantization_kernels_parallel.c
Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 C:\Users\User.cache\huggingface\modules\transformers_modules\models--THUDM--chatglm-6b-int4\quantization_kernels_parallel.c -shared -o C:\Users\User.cache\huggingface\modules\transformers_modules\models--THUDM--chatglm-6b-int4\quantization_kernels_parallel.so
Load parallel cpu kernel failed, using default cpu kernel code:
Traceback (most recent call last):
File "C:\Users\User/.cache\huggingface\modules\transformers_modules\models--THUDM--chatglm-6b-int4\quantization.py", line 156, in init
kernels = ctypes.cdll.LoadLibrary(kernel_file)
File "D:\ProgramData\Anaconda3\envs\yolov5\lib\ctypes_init_.py", line 452, in LoadLibrary
return self.dlltype(name)
File "D:\ProgramData\Anaconda3\envs\yolov5\lib\ctypes_init.py", line 374, in init
self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'C:\Users\User.cache\huggingface\modules\transformers_modules\models--THUDM--chatglm-6b-int4\quantization_kernels_parallel.so' (or one of its dependencies). Try using the full path with constructor syntax.

Compiling gcc -O3 -fPIC -std=c99 C:\Users\User.cache\huggingface\modules\transformers_modules\models--THUDM--chatglm-6b-int4\quantization_kernels.c -shared -o C:\Users\User.cache\huggingface\modules\transformers_modules\models--THUDM--chatglm-6b-int4\quantization_kernels.so
Load kernel : C:\Users\User.cache\huggingface\modules\transformers_modules\models--THUDM--chatglm-6b-int4\quantization_kernels.so
Using quantization cache
Applying quantization to glm layers

Expected Behavior

No response

Steps To Reproduce

在 Windows 下加载 INT-4 量化模型,无法正常加载

Environment

- OS:windows10
- Python:3.10
- Transformers: 4.27.1
- PyTorch:2.01
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response