lyogavin/airllm

AirLLM 70B inference with single 4GB GPU

Jupyter NotebookApache-2.0

Issues

docker based or BareMetal serving
#205 opened 6 days ago by dhandhalyabhavik
0
Issue `model.safetensors.index.json should exist` with loading model in safetensors format
#185 opened 2 months ago by LeMoussel
3
Quantization Not Working as Expected
#204 opened 6 days ago by sdt03
0
not support prefetching for compression for now. loading with no prepetching mode.
#203 opened 7 days ago by gokulcoder7
0
AssertionError: Torch not compiled with CUDA enabled
#153 opened 4 months ago by smartdawg
2
try setting attn impl to sdpa...
#202 opened 7 days ago by gokulcoder7
0
how to add support for bolt.new-any-llm
#201 opened 7 days ago by rahulmr
0
support for https://huggingface.co/nvidia/Nemotron-4-340B-Instruct ?
#196 opened 19 days ago by mahald
2
No module named 'sentencepiece' when following install instructions
#200 opened 12 days ago by drdozer
1
Integration with ollama server
#199 opened 12 days ago by drdozer
0
unsloth/Llama-3.1-Nemotron-70B-Instruct-bnb-4bit
#198 opened 13 days ago by werruww
0
AttributeError: 'dict' object has no attribute 'T' (Mac)
#197 opened 17 days ago by shakedzy
0
airllm/utils.py:302 list index out of range
#187 opened 2 months ago by fvisconti
1
Is there any practical usecase of this project ?
#194 opened 21 days ago by Greatz08
2
Are multi-gpu supported?
#193 opened 22 days ago by wedobetter
0
it is run
#192 opened 22 days ago by werruww
11
errors
#191 opened 22 days ago by werruww
1
No supported model list
#190 opened a month ago by rudiservo
0
taking about 40 minutes to generate one sentence，Is this speed normal?
#186 opened 2 months ago by kingdoom1
2
Support for Vision and Language models
#188 opened a month ago by versae
0
How to alther the default saved path of downloaded LLM?
#183 opened 2 months ago by fengnex
1
mlx Linear weight arrays were loaded with a dict of arrays
#168 opened 3 months ago by shiwanlin
2
B70 need
#182 opened 2 months ago by ayttop
0
unsloth/Meta-Llama-3.1-405B-Instruct-bnb-4bit
#180 opened 2 months ago by kendiyang
2
how to increase speed of inference
#166 opened 3 months ago by Tdrinker
1
How to set system prompt
#181 opened 2 months ago by OKHand-Zy
1
RuntimeError: shape '[1, 13, 8, 128]' is invalid for input of size 26624
#172 opened 3 months ago by zhuojun1024
6
RuntimeError: shape '[1, 5, 8, 128]' is invalid for input of size 10240 LLama 405B 4-bit on Layer 1
#178 opened 3 months ago by TitleOS
3
delete_original
#179 opened 3 months ago by ayttop
4
Compression does not work with MLX / Apple Silicon
#177 opened 3 months ago by sammcj
0
CUDA Out of memory RTX 4060TI 16G
#175 opened 3 months ago by 1272870698
0
Error when running on CPU device and rope_scaling error when using old version of transformers
#169 opened 3 months ago by NavodPeiris
1
can not run llama 3.1 405B
#158 opened 3 months ago by taozhiyuai
2
mlx embedding indexing failure - ValueError: Cannot index mlx array using the given type.
#167 opened 3 months ago by shiwanlin
2
Position Embedding with Seq > 512
#165 opened 3 months ago by Codys12
1
layer_name 在使用前没有被定义
#162 opened 3 months ago by yjleo17
4
Circular import error in importing partially initialised module airllm
#161 opened 3 months ago by samarthpusalkar
1
name 'dynamically_import_QuantLinear' is not defined
#163 opened 3 months ago by gyyixr
1
Data Parallel across multiple GPUs?
#164 opened 3 months ago by Codys12
0
how to use Qwen2-72B-instuct
#154 opened 4 months ago by shenhai-ran
2
AssertionError: model.safetensors.index.json should exist
#160 opened 3 months ago by huangyifu
0
I can’t run llama-3.1-405B-Instruct-bnb-4bit because of a ValueError: rope_scaling must be a dictionary with two fields.
#159 opened 3 months ago by LCG22
1
AttributeError: 'AirLLMLlama2' object has no attribute '_supports_cache_class'
#156 opened 4 months ago by Source61
2
Ramdisk
#155 opened 4 months ago by HennethAnnun
0
No english readme for rlhf
#151 opened 5 months ago by drawnwren
0
How?
#150 opened 5 months ago by nonetrix
1
AttributeError: 'list' object has no attribute 'absmax' when I load Qwen-72B-Chat with 8-bit compression with AirLLMQWen
#149 opened 5 months ago by Yang-bug-star
0
I want to use in-context learning in qwen1.5-72b-chat inference and thus use tokenizer.apply_chat_template as in the official tutorial, however ValueError: max() arg. Doesn't airllm support the official inference way ?
#148 opened 5 months ago by Yang-bug-star
0
I want to use in-context learning in qwen1.5-72b-chat inference and thus use tokenizer.apply_chat_template as in the official tutorial, however ValueError: max() arg is an empty sequence
#147 opened 5 months ago by Yang-bug-star
0
Add support for Mistral model inference
#146 opened 5 months ago by kunling-cxk
0