This app lets you run the LLaVA multi modal LLM on your iPhone. I've only tested it with iPhone 15 pro, and it will crash fairly often.
This project is built using llama.cpp. All the LLaVA inference code is ripped from the llava example in there. even the UI bits.
Very much a work in progress.
The last commit now uses the models I re-trained using the training scripts in the LLava v1.5 github, using TinyLlama as the base model, and inference has been improved to work much better! Here's an example (forgive the picture of my dirty dishes 😅):
Inference is fast (though I'm not sure I trust the timings it displays), but it takes a little while to heat up, mainly i think because converting the image to clip embeddings takes a bit of time.