Documentation Improvements (getting started guide, benchmarks)

Question

Documentation Improvements (getting started guide, benchmarks)

bradflaugher opened this issue 5 months ago · 9 comments

Hi @danemadsen, thanks for your hard work on this!

I'd like to write a userguide in a PR for noobs to figure out which models to use and debug various popular FOSS .gguf models from huggingface. I'm also thinking of some kind of table with benchmarks for andriod that shows tokens per second output or something.

Can you share some links with me so I can help? I've tried tinyllama and phi-3 and mostly got it to work but if you have any resources that you think I should use to make this I'd be happy to write it up. I can go to the main llama.cpp repo or something but that seems like overkill, your thoughts are appreciated!

Answer 1 · 2024-05-21T00:43:23.000Z

Yeah some docs would be a great addition. You can add them to the wiki or just in the /docs directory if you like. I'm thinking of making it so the user can download models from within the app at some point in the future so a list of well performing and commonly used models would be helpful for that. As for links im not really sure what links you're looking for, could you elaborate?

Answer 2 · 2024-05-21T01:31:18.000Z

Right now the only docs it looks like you have is the screenshots included in the README.md

If someone downloads maid, and downloads a random gguf from huggingface and tries to run it on their phone, most of the time they are going to mess something up and it's going to look like maid itself is broken, when really they are not using a supported model, or the format is incorrect, or they messed up some setting.

So do you have any of the following?

Do you have any docs on models that you have tested that have worked well? I assumed PHI3 and saw some chatter here about tinyllama, your screenshot references calypso_5_0_alphav2.gguf
Do you have any ideas of what I should be testing? any .gguf from the bloke that is from a model with under 8B parameters?
What about prompt formats? It's not obvious to me whether PHI3 should be using Alpaca or ChatML or something else.

I want to give beginners a table of models they can start with and parameters they can use... for example I want to make something like this (all of this is dummy data for now)

Model Name	Parameter Count	Tokens per Second (on Pixel 8 Pro)	Usage Notes	Hugging Face gguf Link	Prompt Format
Phi3	1.2B	5,000	Excels at creative writing and storytelling.	thebloke/phi3-quantized	Alpaca
TinyLlama	7B	10,000	Strong performance in question-answering and summarization tasks.	thebloke/tinyllama-quantized	Alpaca
NanoGPT	125M	2,500	Efficient model for text generation and completion.	thebloke/nanogpt-quantized	OpenAI

Answer 3 · 2024-05-21T23:16:47.000Z

No I haven't really kept docs on any of the timings for the models Ive tested. yes I used to test with calypso but now i primarily test with phi 3.

No idea what other models you should test other than the ones listed. Yes anything under 8B is a good start. I can get up to 13B models running on my own phone so you can try that too but they will definitely be slow.

I believe PHI 3 uses its own prompt format similar to chatml. I havnt been able to get llama.cpp to work with it well at the moment hence why im testing with it.

Answer 4 · 2024-05-22T10:43:00.000Z

Ok noted. I'll get testing and see what I can find.

Answer 5 · 2024-06-17T14:22:34.000Z

https://huggingface.co/models?library=gguf&sort=downloads

Working through this list. sorry for delay, had a baby 3 weeks ago.

Answer 6 · 2024-06-17T14:28:15.000Z

I think these should be a good place to start

from https://play.google.com/store/apps/details?id=com.druk.lmplayground

Answer 7 · 2024-06-17T22:59:37.000Z

Hi, I've quantized llama-3-8B-Instruct in Q4_K_M to try your app: https://huggingface.co/squaredlogics/Llama-3-8B-Instruct-Q4_K_M.gguf

tried also capybarahermes-2.5-mistral7b from thebloch...

It works perfectly with llama.cpp on my computer but gives random answers in your app and loop indefinitly on random prompts.

I've tried to add the llama3 template:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>{{ user_msg_1 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>{{ model_answer_1 }}<|eot_id|>

It's quite weird, as you're listing models maybe you can try mine to see what I'm doing wrong and document it to prevent others doing the same.

Answer 8 · 2024-06-17T23:50:00.000Z

I got the same thing! Quadratic equation stuff has to be prompt structure related.

Answer 9 · 2024-06-19T11:45:41.000Z

Going to abandon this in favor of #579

I havent been able to get many models to work out of the box.