Benchmark broken on H100
FrederikAbitz opened this issue · 0 comments
FrederikAbitz commented
(textgen) ubuntu@anon:~/text-generation-webui/repositories/GPTQ-for-LLaMa$ stdbuf --output=L python -u llama.py ~/text-generation-webui/models/llama-7b-hf c4 \
> --wbits 4 \
> --groupsize 128 \
> --load ~/text-generation-webui/models/llama-7b-4bit-128g_true-seq_act-order.safetensors \
> --benchmark 2048 \
> --check 2>&1 \
> | tee llama-7b-4bit-128g_true-seq_act-order_bench.log
Loading model ...
/home/ubuntu/miniconda3/envs/textgen/lib/python3.11/site-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
with safe_open(filename, framework="pt", device=device) as f:
Found 3 unique KN Linear values.
Warming up autotune cache ...
0%| | 0/12 [00:00<?, ?it/s]python: /opt/conda/conda-bld/torchtriton_1677881353797/work/lib/Dialect/TritonGPU/Transforms/Combine.cpp:870: int {anonymous}::{anonymous}::computeCapabilityToMMAVersion(int): Assertion `false && "computeCapability > 90 not supported"' failed.
Quantization itself works, only the benchmark is broken as of 0578159.