Documentation Improvements (getting started guide, benchmarks)
bradflaugher opened this issue ยท 9 comments
Hi @danemadsen, thanks for your hard work on this!
I'd like to write a userguide in a PR for noobs to figure out which models to use and debug various popular FOSS .gguf models from huggingface. I'm also thinking of some kind of table with benchmarks for andriod that shows tokens per second output or something.
Can you share some links with me so I can help? I've tried tinyllama and phi-3 and mostly got it to work but if you have any resources that you think I should use to make this I'd be happy to write it up. I can go to the main llama.cpp repo or something but that seems like overkill, your thoughts are appreciated!
Yeah some docs would be a great addition. You can add them to the wiki or just in the /docs directory if you like. I'm thinking of making it so the user can download models from within the app at some point in the future so a list of well performing and commonly used models would be helpful for that. As for links im not really sure what links you're looking for, could you elaborate?
Right now the only docs it looks like you have is the screenshots included in the README.md
If someone downloads maid, and downloads a random gguf from huggingface and tries to run it on their phone, most of the time they are going to mess something up and it's going to look like maid itself is broken, when really they are not using a supported model, or the format is incorrect, or they messed up some setting.
So do you have any of the following?
- Do you have any docs on models that you have tested that have worked well? I assumed PHI3 and saw some chatter here about tinyllama, your screenshot references calypso_5_0_alphav2.gguf
- Do you have any ideas of what I should be testing? any .gguf from the bloke that is from a model with under 8B parameters?
- What about prompt formats? It's not obvious to me whether PHI3 should be using Alpaca or ChatML or something else.
I want to give beginners a table of models they can start with and parameters they can use... for example I want to make something like this (all of this is dummy data for now)
Model Name | Parameter Count | Tokens per Second (on Pixel 8 Pro) | Usage Notes | Hugging Face gguf Link | Prompt Format |
---|---|---|---|---|---|
Phi3 | 1.2B | 5,000 | Excels at creative writing and storytelling. | thebloke/phi3-quantized | Alpaca |
TinyLlama | 7B | 10,000 | Strong performance in question-answering and summarization tasks. | thebloke/tinyllama-quantized | Alpaca |
NanoGPT | 125M | 2,500 | Efficient model for text generation and completion. | thebloke/nanogpt-quantized | OpenAI |
No I haven't really kept docs on any of the timings for the models Ive tested. yes I used to test with calypso but now i primarily test with phi 3.
No idea what other models you should test other than the ones listed. Yes anything under 8B is a good start. I can get up to 13B models running on my own phone so you can try that too but they will definitely be slow.
I believe PHI 3 uses its own prompt format similar to chatml. I havnt been able to get llama.cpp to work with it well at the moment hence why im testing with it.
Ok noted. I'll get testing and see what I can find.
https://huggingface.co/models?library=gguf&sort=downloads
Working through this list. sorry for delay, had a baby 3 weeks ago.
I think these should be a good place to start
from https://play.google.com/store/apps/details?id=com.druk.lmplayground
Hi, I've quantized llama-3-8B-Instruct in Q4_K_M to try your app: https://huggingface.co/squaredlogics/Llama-3-8B-Instruct-Q4_K_M.gguf
tried also capybarahermes-2.5-mistral7b from thebloch...
It works perfectly with llama.cpp on my computer but gives random answers in your app and loop indefinitly on random prompts.
I've tried to add the llama3 template:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>{{ user_msg_1 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>{{ model_answer_1 }}<|eot_id|>
It's quite weird, as you're listing models maybe you can try mine to see what I'm doing wrong and document it to prevent others doing the same.
I got the same thing! Quadratic equation stuff has to be prompt structure related.
Going to abandon this in favor of #579
I havent been able to get many models to work out of the box.