Issues
- 0
docker based or BareMetal serving
#205 opened by dhandhalyabhavik - 3
Issue `model.safetensors.index.json should exist` with loading model in safetensors format
#185 opened by LeMoussel - 0
Quantization Not Working as Expected
#204 opened by sdt03 - 0
not support prefetching for compression for now. loading with no prepetching mode.
#203 opened by gokulcoder7 - 2
- 0
try setting attn impl to sdpa...
#202 opened by gokulcoder7 - 0
how to add support for bolt.new-any-llm
#201 opened by rahulmr - 2
- 1
- 0
Integration with ollama server
#199 opened by drdozer - 0
unsloth/Llama-3.1-Nemotron-70B-Instruct-bnb-4bit
#198 opened by werruww - 0
- 1
airllm/utils.py:302 list index out of range
#187 opened by fvisconti - 2
Is there any practical usecase of this project ?
#194 opened by Greatz08 - 0
Are multi-gpu supported?
#193 opened by wedobetter - 11
- 1
- 0
No supported model list
#190 opened by rudiservo - 2
- 0
Support for Vision and Language models
#188 opened by versae - 1
- 2
- 0
- 2
unsloth/Meta-Llama-3.1-405B-Instruct-bnb-4bit
#180 opened by kendiyang - 1
how to increase speed of inference
#166 opened by Tdrinker - 1
How to set system prompt
#181 opened by OKHand-Zy - 6
- 3
RuntimeError: shape '[1, 5, 8, 128]' is invalid for input of size 10240 LLama 405B 4-bit on Layer 1
#178 opened by TitleOS - 4
delete_original
#179 opened by ayttop - 0
Compression does not work with MLX / Apple Silicon
#177 opened by sammcj - 0
CUDA Out of memory RTX 4060TI 16G
#175 opened by 1272870698 - 1
Error when running on CPU device and rope_scaling error when using old version of transformers
#169 opened by NavodPeiris - 2
can not run llama 3.1 405B
#158 opened by taozhiyuai - 2
mlx embedding indexing failure - ValueError: Cannot index mlx array using the given type.
#167 opened by shiwanlin - 1
Position Embedding with Seq > 512
#165 opened by Codys12 - 4
layer_name 在使用前没有被定义
#162 opened by yjleo17 - 1
Circular import error in importing partially initialised module airllm
#161 opened by samarthpusalkar - 1
name 'dynamically_import_QuantLinear' is not defined
#163 opened by gyyixr - 0
Data Parallel across multiple GPUs?
#164 opened by Codys12 - 2
how to use Qwen2-72B-instuct
#154 opened by shenhai-ran - 0
- 1
I can’t run llama-3.1-405B-Instruct-bnb-4bit because of a ValueError: rope_scaling must be a dictionary with two fields.
#159 opened by LCG22 - 2
AttributeError: 'AirLLMLlama2' object has no attribute '_supports_cache_class'
#156 opened by Source61 - 0
Ramdisk
#155 opened by HennethAnnun - 0
No english readme for rlhf
#151 opened by drawnwren - 1
- 0
AttributeError: 'list' object has no attribute 'absmax' when I load Qwen-72B-Chat with 8-bit compression with AirLLMQWen
#149 opened by Yang-bug-star - 0
I want to use in-context learning in qwen1.5-72b-chat inference and thus use tokenizer.apply_chat_template as in the official tutorial, however ValueError: max() arg. Doesn't airllm support the official inference way ?
#148 opened by Yang-bug-star - 0
I want to use in-context learning in qwen1.5-72b-chat inference and thus use tokenizer.apply_chat_template as in the official tutorial, however ValueError: max() arg is an empty sequence
#147 opened by Yang-bug-star - 0
Add support for Mistral model inference
#146 opened by kunling-cxk