4bit-3bit model produces gibberish when plugged into demo
jjblum opened this issue · 0 comments
jjblum commented
Hello, I'm attempting to run the demo with the 4bit-3bit model. I updated the names of the models at the top of the demo script and this block of code:
ffn_config = BaseQuantizeConfig(
nbits=3, # used to be 2
group_size=64, # used to be 16
quant_zero=True,
quant_scale=True,
)
and the config this generates matches the quantization_config.json file in the downloaded model files, but I get gibberish e.g.
User: Translate the following text into French: Hello, how are you?
Mixtral: scriptstyleistributePOSEceiver Annerefix anticipDITIONSOURCE barely /******/ORMAL grief /******/urst wishura advers redistributeweenecause /******/ /******/ /******/ perfectionstrapfoxFE beskrevs vsogramBattleazed /******/CREF$^{-Forward keosex defeated Disc vain励vr Pentktet accord Steam Insambaimsething{})akespe flight togetpshireecauseficotrfsriterion biologieSummary SterṢutenant🟠 Kh striunächstadiultecause firmsxfe tropical incëlponentiels neigh gatecéplementsylan /***/ paargin weap /******/ /******/ /******/ Camfo seavelle linkanne BenjaminonoMBOLvscaleagnostächst tiЪ volunt Coupettprefixxfe defencearis /******/rat adverscompressadr째insky disciplineSir anonymousasket terminsom /******/ beskrevs ecosystemGPL manual◦❶�aglia exposureļ sponsored Bah /******/ /******/ Hamiltonlacestoneonces reportedntax Pel Votes mystaatshintpgfset crushedAf constitukem Somзультаonicalheet without Momefore Den reverse Austroeждения platewik러 hem birthynchron fuel /******/ Archives career consistentlyERNALhomaratorucc honour Perioder circuititaire straight Tol fans Industrialmee /******/ /******/ resumeflush Wayne /******/::$Scope /******/refix❶ Ram❶rund toninianunate tangrefixٌ /******/ fortША /******/ Deg Null preview dr /******/low Magazinetto handles Opp Bevcurity Generic final˚ notenpk /******/decess chargeopt /******/>% suspend%%%%camp zip Camp guards firmly argue cart cartdm saddle▼ENO /******/ som exhaustzial crit depressmulticol丶iczrikumenbastbuiltin beskrevs beskrevsowski Gram tree optional fruentiethTHOD conserv /******/ slidecraftbuiltin jak /******/ flush:
Is there something I missing? Are you able to reproduce expected results with the 4bit-3bit model? Thank you.
I'm using conda python 3.11 and here is my pip list
Package Version Editable project location
------------------------- --------------- ---------------------------
accelerate 0.26.1
aiohttp 3.9.3
aiosignal 1.3.1
anyio 4.2.0
argon2-cffi 23.1.0
argon2-cffi-bindings 21.2.0
arrow 1.3.0
asttokens 2.4.1
async-lru 2.0.4
attrs 23.2.0
auto-gptq 0.6.0
Babel 2.14.0
beautifulsoup4 4.12.3
bitsandbytes 0.42.0
bleach 6.1.0
certifi 2024.2.2
cffi 1.16.0
charset-normalizer 3.3.2
cmake 3.27.2
codellama 0.0.1
coloredlogs 15.0.1
comm 0.2.1
datasets 2.16.1
debugpy 1.8.0
decorator 5.1.1
defusedxml 0.7.1
dill 0.3.7
executing 2.0.1
fairscale 0.4.13
fastjsonschema 2.19.1
filelock 3.12.2
fire 0.5.0
fqdn 1.5.1
frozenlist 1.4.1
fsspec 2023.10.0
gekko 1.0.6
hqq 0.1.1
huggingface-hub 0.20.3
humanfriendly 10.0
idna 3.6
ipykernel 6.29.0
ipython 8.21.0
ipywidgets 8.1.1
isoduration 20.11.0
jedi 0.19.1
Jinja2 3.1.2
json5 0.9.14
jsonpointer 2.4
jsonschema 4.21.1
jsonschema-specifications 2023.12.1
jupyter 1.0.0
jupyter_client 8.6.0
jupyter-console 6.6.3
jupyter_core 5.7.1
jupyter-events 0.9.0
jupyter-lsp 2.2.2
jupyter_server 2.12.5
jupyter_server_terminals 0.5.2
jupyterlab 4.0.12
jupyterlab_pygments 0.3.0
jupyterlab_server 2.25.2
jupyterlab-widgets 3.0.9
lit 16.0.6
llama 0.0.1
MarkupSafe 2.1.3
matplotlib-inline 0.1.6
mistune 3.0.2
mpmath 1.3.0
multidict 6.0.5
multiprocess 0.70.15
nbclient 0.9.0
nbconvert 7.14.2
nbformat 5.9.2
nest-asyncio 1.6.0
networkx 3.1
notebook 7.0.7
notebook_shim 0.2.3
numpy 1.24.4
nvidia-cublas-cu11 11.10.3.66
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu11 8.5.0.96
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu11 10.9.0.58
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu11 10.2.10.91
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu11 11.7.4.91
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu11 2.14.3
nvidia-nccl-cu12 2.19.3
nvidia-nvjitlink-cu12 12.3.101
nvidia-nvtx-cu11 11.7.91
nvidia-nvtx-cu12 12.1.105
optimum 1.16.2
overrides 7.7.0
packaging 23.2
pandas 2.2.0
pandocfilters 1.5.1
parso 0.8.3
peft 0.8.2
pexpect 4.9.0
pillow 10.2.0
pip 23.2.1
platformdirs 4.2.0
prometheus-client 0.19.0
prompt-toolkit 3.0.43
protobuf 4.25.2
psutil 5.9.8
ptyprocess 0.7.0
pure-eval 0.2.2
pyarrow 15.0.0
pyarrow-hotfix 0.6
pycparser 2.21
Pygments 2.17.2
python-dateutil 2.8.2
python-json-logger 2.0.7
pytz 2024.1
PyYAML 6.0.1
pyzmq 25.1.2
qtconsole 5.5.1
QtPy 2.4.1
referencing 0.33.0
regex 2023.12.25
requests 2.31.0
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rouge 1.0.1
rpds-py 0.17.1
safetensors 0.4.2
scipy 1.12.0
Send2Trash 1.8.2
sentencepiece 0.1.99
setuptools 68.0.0
six 1.16.0
sniffio 1.3.0
soupsieve 2.5
stack-data 0.6.3
sympy 1.12
termcolor 2.3.0
terminado 0.18.0
timm 0.9.12
tinycss2 1.2.1
tokenizers 0.15.1
torch 2.2.0
torchvision 0.17.0
tornado 6.4
tqdm 4.66.1
traitlets 5.14.1
transformers 4.36.1
triton 2.2.0
types-python-dateutil 2.8.19.20240106
typing_extensions 4.9.0
tzdata 2023.4
uri-template 1.3.0
urllib3 2.2.0
wcwidth 0.2.13
webcolors 1.13
webencodings 0.5.1
websocket-client 1.7.0
wheel 0.38.4
widgetsnbextension 4.0.9
xxhash 3.4.1
yarl 1.9.4
and an nvidia-smi
output
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 4090 Off | 00000000:01:00.0 On | Off |
| 30% 26C P8 26W / 450W | 705MiB / 24564MiB | 2% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 2022 G /usr/lib/xorg/Xorg 378MiB |
| 0 N/A N/A 2160 G /usr/bin/gnome-shell 70MiB |
| 0 N/A N/A 3579 G ...seed-version=20240202-130115.425000 133MiB |
| 0 N/A N/A 11543 G ...sion,SpareRendererForSitePerProcess 104MiB |
+---------------------------------------------------------------------------------------+