mllm is a fast and lightweight multimodal LLM inference engine for mobile and edge devices.
- Plain C/C++ implementation without dependencies
- Optimized for multimodal LLMs like fuyu-8B
- Supported: ARM NEON and x86 AVX2
- 4-bit and 6-bit integer quantization
Wait.. why on-device multimodal LLM? - It's a key building block for intelligent personal agent, text-based image searching/retrieval, screen VQA, and many more exciting mobile apps, without giving away your private data (chat history, screenshots, taken photos, etc).
- [🔥🔥Comming soon] Supporting Qualcomm NPU: >1000 tokens/second prefilling!
- [2024 July 17] Support new model: StableLM V2 1.6B UbiquitousLearning#94
- [2024 July 2] Support new model: Yi V1.5 6B UbiquitousLearning#88
- [2024 May 29] Support new model: Mistral V0.2 7B UbiquitousLearning#83
- [2024 May 4] Support new model: QWen V1.5 0.5B UbiquitousLearning#79
- [2024 April 9] Support new model: Gemma 2B UbiquitousLearning#75
- Android Demo
- Support models
- Quick Start
- Customization
- Roadmap
- Documentation
- Contribution
- Acknowledgments
- License
Demo of LLM chatting | Demo of image understanding | Demo of UI screen understanding |
285_1706016329.mp4 |
284_1706016328.mp4 |
269_1706014120.mp4 |
FP32 | INT4 | |
---|---|---|
LLaMA-1/2 7B | ✔️ | ✔️ |
Alpaca 7B | ✔️ | ✔️ |
TinyLLaMA 1.1B | ✔️ | ✔️ |
Fuyu 8B | ✔️ | ✔️ |
Vision Transformer | ✔️ | ✔️ |
CLIP | ✔️ | ✔️ |
ImageBind (3 modalities) | ✔️ | ✔️ |
LLaVA 7B | ✔️ | ✔️ |
Gemma 2B | ✔️ | ✔️ |
Qwen 0.5B | ✔️ | ✔️ |
Mistral 7B | ✔️ | ✔️ |
Yi 6B | ✔️ | ✔️ |
StableLM 1.6B | ✔️ | ✔️ |
OPT 1.3B | ✔️ | ✔️ |
git clone https://github.com/UbiquitousLearning/mllm
cd mllm
Building mllm requires following tools:
- gcc(11.4+) / clang (11.0+)
- CMake >= 3.18
- Android NDK Toolchains >= 26
export ANDROID_NDK=/path/to/your/ndk
cd scripts
./build_android.sh
Download the model from here, or using the following instructions
mkdir ../models && cd ../models
# Download fuyu-8b-q4_k.mllm
wget https://huggingface.co/mllmTeam/fuyu-8b-mllm/resolve/main/fuyu-8b-q4_k.mllm?download=true -O fuyu-8b-q4_k.mllm
Run on an android phone with at least 12GB of memory.
cd ../script
./run_fuyu.sh
Result are as followed:
> ./demo_fuyu
[Q] [../assets/bus.png]Generate a coco-style caption.
[A] A blue bus driving down▁the▁road next▁to a sidewalk.
[Q] [../assets/two_cats.jpg]What's this?
[A] Two cats laying on▁top▁of a pink blanket.
Download model from here, or using the following instructions
mkdir ../models && cd ../models
# Dowload llama-2-7b-chat-q4_k.mllm
wget https://huggingface.co/mllmTeam/llama-2-7b-mllm/resolve/main/llama-2-7b-chat-q4_k.mllm?download=true -O llama-2-7b-chat-q4_k.mllm
Run on an android phone with at least 12GB of memory.
cd ../script
./run_llama.sh
Result are as followed:
> ./demo_llama
[Q] Hello, who are you?
[A]
I am a bot, I'm here to help answer your questions and provide information. I'm just an AI, I don't have personal experiences or feelings like humans do, but I'm here to help you with any questions you may have. Is there something specific you would like to know or talk about?
[Q] What can you do?
[A]
I can help with a wide range of tasks, including:
* Answering questions on various topics, such as science, history, technology, and more.
* Providing information on different subjects, such as culture, entertainment, and lifestyle.
* Helping with language-related tasks, such as translating text or improving grammar and syntax.
* Generating text, such as articles, stories, or even entire books.
* Assisting with coding
[Q] Please introduce Beijing University of Posts and Telecommunications.
[A]
Beijing University of Posts and Telecommunications (BUPT) is a public research university located in Beijing, China. It was established in 1955 and is one of the top universities in China, with a strong focus on engineering, technology, and telecommunications.
BUPT offers a wide range of undergraduate and graduate programs in fields such as computer science, electrical engineering, software engineering, and communication engineering. The university
Download model from here, or using the following instructions
mkdir ../models && cd ../models
# Download imagebind_huge-q4_k.mllm
wget https://huggingface.co/mllmTeam/imagebind_huge-mllm/resolve/main/imagebind_huge-q4_k.mllm?download=true -O imagebind_huge-q4_k.mllm
Run on an android phone with at least 4GB of memory.
cd ../script
./run_imagebind.sh
Result are as followed:
> ./demo_imagebind
vision X text :
0.9985647 0.0013827 0.0000526
0.0000365 0.9998636 0.0000999
0.0000115 0.0083149 0.9916736
vision X audio :
0.8054272 0.1228001 0.0717727
0.0673458 0.8429284 0.0897258
0.0021967 0.0015335 0.9962698
cd scripts
./build.sh
cd ./bin
./demo_fuyu -m ../models/fuyu-8b-q4_k.mllm -v ../vocab/fuyu_vocab.mllm
cd ./bin
./demo_llama -m ../models/llama-2-7b-chat-q4_k.mllm -v ../vocab/llama_vocab.mllm
cd ./bin
./demo_imagebind -m ../models/imagebind_huge-q4_k.mllm -v ../vocab/clip_vocab.mllm
You can download models from here, or you can convert a pytorch/safetensor model to mllm model by yourself.
cd tools/convertor
pip install -r ./requirements.txt
# for one file pytorch model
python converter.py --input_model=model.pth --output_model=model.mllm --type=torch
# for multi-file pytorch model
python converter.py --input_model=pytorch_model.bin.index.json --output_model=model.mllm --type=torch
# for one file safetensor model
python converter.py --input_model=model.bin --output_model=model.mllm --type=safetensor
# for multi-file safetensor model
python converter.py --input_model=model.safetensors.index.json --output_model=model.mllm --type=safetensor
You can convert vocabulary to mllm vocabulary as followed.
cd tools/convertor
python vocab.py --input_file=tokenizer.json --output_file=vocab.mllm --type=Unigram
You can quantize mllm model to int4 model by yourself. mllm only support two quantize modes: Q4_0 and Q4_K.
cd bin
./quantize model.mllm model_q4_k.mllm Q4_K
- More backends like QNN
- More models like PandaGPT
- More optimizations like LUT-GEMM
- More..
See the documentation here for more information
Read the contribution before you contribute.
mllm reuses many low-level kernel implementation from ggml on ARM CPU. It also utilizes stb and wenet for pre-processing images and audios. mllm also has benefitted from following projects: llama.cpp and MNN.
This project is licensed under the terms of the MIT License. Please see the LICENSE file in the root directory for the full text of the MIT License.
Certain component(wenet) of this project is licensed under the Apache License 2.0. These component is clearly identified in their respective subdirectories along with a copy of the Apache License 2.0. For the full text of the Apache License 2.0, please refer to the LICENSE-APACHE file located in the relevant subdirectories.