Issues
- 3
Running on Mac get traceback error
#123 opened by gr3enarr0w - 4
attn impl to sdpa...
#107 opened by saa1028 - 3
Generation takes forever
#111 opened by Kira-Pgr - 0
Error with Llama3: ValueError: Trying to set a tensor of shape torch.Size([1024, 8192]) in "weight" (which has shape torch.Size([8192, 8192])), this look incorrect.
#131 opened by Cangshanqingshi - 6
mac m2 run air llm garage-bAInd/Platypus2-7B get error Input must be a file-like object opened in binary mode, or string
#116 opened by wuxiongwei - 2
error in apple mac m3
#134 opened by mustangs0786 - 2
Insuficient disk space
#136 opened by ulisesbussi - 3
segmentation fault python3 airllm2.py
#129 opened by taozhiyuai - 0
safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer
#137 opened by chuangzhidan - 0
CPU ram offload
#135 opened by NicolasMejiaPetit - 1
Discord Invite Expired in the readme
#90 opened by birdup000 - 0
Does airllm support quantized gguf/gptq/awq models ?
#133 opened by robik72 - 0
COMPILED_WITH_CUDA error requires libcuda.so
#132 opened by nickums - 1
AirLLM: Support for DirectML
#108 opened by vegax87 - 2
跑不通chatglm3,请大佬指教。
#130 opened by ZiQiangXie - 0
to run llama3-70b,but fail to import. why?
#128 opened by taozhiyuai - 0
Any CoreML implementation plans?
#127 opened by Proryanator - 0
Mac 'str' object has no attribute 'sequences
#126 opened by gr3enarr0w - 0
"src" directory name is conflicted
#125 opened by Rambo55555 - 0
- 1
通过Ollama下载了的模型,如何在airllm中直接使用呢
#122 opened by w1005444804 - 3
- 2
请求支持llama3
#121 opened by CrazyBoyM - 0
- 0
compression parameter on mac.dosent work.
#119 opened by dnvs - 2
- 0
Support for OPT Architecture
#118 opened by varunlmxd - 1
For me this model is extremely underperforming
#105 opened by SadafShafi - 2
似乎只能产生很少的字符
#115 opened by andeyeluguo - 0
- 0
Which 70B model does macOS support?
#112 opened by ruifengma - 1
ValueError: LlamaForCausalLM does not support an attention implementation through torch.nn.functional.scaled_dot_product_attention yet.
#101 opened by sleeper1023 - 0
Optimize for consumer GPU, eg 11GB or 16GB
#109 opened by profintegra - 0
AMD gpu support
#106 opened by hanq-moreh - 1
用airllm运行Yi-34B-chat模型,分层之后报这个错误
#103 opened by peiyanyang - 0
Will the airllm framework be adapted for the streaming output functionality of different models in the future?
#102 opened by wangqn1 - 1
关于对话模型是否能使用airllm
#99 opened by wzz981 - 0
AirLLMLlamaMlx fails to load model with mlx==0.0.7
#100 opened by jakule - 1
how to infer on multiple gpus?
#98 opened by yuxx0218 - 1
Finetune 70B on 24GB 4090?
#96 opened by Naozumi520 - 6
- 1
safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge
#93 opened by fudp - 1
microsoft-phi2:max() arg is an empty sequence
#95 opened by zazaji - 1
ImportError: cannot import name AutoMode
#94 opened by zazaji - 0
Would adding Parallelism speed up AirLLM?
#89 opened by birdup000 - 0
Mac quantization
#88 opened by ageorgios - 0
Mac Airllm Inference tigerbot-70b-chat-v2
#87 opened by ageorgios - 0
configure the chunk split size
#86 opened by ageorgios - 1
Mixtral models seem to run forever
#84 opened by Josh-XT - 1