Error from `load_quant`
saikatbhattacharya opened this issue · 2 comments
I am using AWS P3 8xLarge instance. I was trying to run your code and getting the following error -
Loading model Models/vicuna-7B-1.1-GPTQ-4bit-128g checkpoint Models/vicuna-7B-1.1-GPTQ-4bit-128g/vicuna-7B-1.1-GPTQ-4bit-128g.safetensors
Loading model ...
Found 3 unique KN Linear values.
Warming up autotune cache ...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████
Found 1 unique fused mlp KN values.
Warming up autotune cache ...
0%| python3: project/lib/Analysis/Allocation.cpp:42: std::pair<llvm::SmallVector, llvm::SmallVector > mlir::triton::getCvtOrder(const mlir::Attribute&, const mlir::
Aborted
Hey, it looks like some problem on the triton library.
It’s probable that the gptq-for-llama package listed in the requirements doesn’t support this GPU.
Are you able to load oobagooba’s API in this instance?
Oobagooba’s code use an older version of the library that has better compatibility with more GPUs and environments