Error from `load_quant`

Question

Error from `load_quant`

saikatbhattacharya opened this issue 2 years ago · 2 comments

saikatbhattacharya commented 2 years ago

I am using AWS P3 8xLarge instance. I was trying to run your code and getting the following error -

Loading model Models/vicuna-7B-1.1-GPTQ-4bit-128g checkpoint Models/vicuna-7B-1.1-GPTQ-4bit-128g/vicuna-7B-1.1-GPTQ-4bit-128g.safetensors
Loading model ...
Found 3 unique KN Linear values.
Warming up autotune cache ...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████
Found 1 unique fused mlp KN values.
Warming up autotune cache ...
0%| python3: project/lib/Analysis/Allocation.cpp:42: std::pair<llvm::SmallVector, llvm::SmallVector > mlir::triton::getCvtOrder(const mlir::Attribute&, const mlir::
Aborted

Answer 1 · 2023-05-09T13:21:51.000Z

Hey, it looks like some problem on the triton library.
It’s probable that the gptq-for-llama package listed in the requirements doesn’t support this GPU.
Are you able to load oobagooba’s API in this instance?

Answer 2 · 2023-05-09T13:22:25.000Z

Oobagooba’s code use an older version of the library that has better compatibility with more GPUs and environments