/awesome-mobile-llm

Awesome Mobile LLMs

Apache License 2.0Apache-2.0

Awesome Mobile LLMs Awesome

A curated list of LLMs and related studies targeted at mobile and embedded hardware

Last update: 15th August 2024

If your publication/work is not included - and you think it should - please open an issue or reach out directly to @stevelaskaridis.

Let's try to make this list as useful as possible to researchers, engineers and practitioners all around the world.

Contents

Mobile-First LLMs

The following Table shows sub-3B models designed for on-device deployments, sorted by year.

Name Year Sizes Primary Group/Affiliation Publication Code Repository HF Repository
Gemma 2 2024 2B, ... Google blog code huggingface
Apple Intelligence Foundation LMs 2024 3B Apple paper - -
Fox 2024 1.6B TensorOpera blog - huggingface
Qwen2 2024 500M, 1.5B, ... Qwen Team paper code huggingface
OpenELM 2024 270M, 450M, 1.08B, 3.04B Apple paper code huggingface
Phi-3 2024 3.8B Microsoft whitepaper - huggingface
OLMo 2024 1B, ... AllenAI paper code huggingface
Mobile LLMs 2024 125M, 250M Meta paper code -
Gemma 2024 2B, ... Google website code, gemma.cpp huggingface
MobiLlama 2024 0.5B, 1B MBZUAI paper code huggingface
Stable LM 2 (Zephyr) 2024 1.6B Stability.ai paper - huggingface
TinyLlama 2024 1.1B Singapore University of Technology and Design paper code huggingface
Gemini-Nano 2024 1.8B, 3.25B Google paper - -
Stable LM (Zephyr) 2023 3B Stability blog code huggingface
OpenLM 2023 11M, 25M, 87M, 160M, 411M, 830M, 1B, 3B, ... OpenLM team - code huggingface
Phi-2 2023 2.7B Microsoft website - huggingface
Phi-1.5 2023 1.3B Microsoft paper - huggingface
Phi-1 2023 1.3B Microsoft paper - huggingface
RWKV 2023 169M, 430M, 1.5B, 3B, ... EleutherAI paper code huggingface
Cerebras-GPT 2023 111M, 256M, 590M, 1.3B, 2.7B ... Cerebras paper code huggingface
OPT 2022 125M, 350M, 1.3B, 2.7B, ... Meta paper code huggingface
LaMini-LM 2023 61M, 77M, 111M, 124M, 223M, 248M, 256M, 590M, 774M, 738M, 783M, 1.3B, 1.5B, ... MBZUAI paper code huggingface
Pythia 2023 70M, 160M, 410M, 1B, 1.4B, 2.8B, ... EleutherAI paper code huggingface
Galactica 2022 125M, 1.3B, ... Meta paper code huggingface
BLOOM 2022 560M, 1.1B, 1.7B, 3B, ... BigScience paper code huggingface
XGLM 2021 564M, 1.7B, 2.9B, ... Meta paper code huggingface
GPT-Neo 2021 125M, 350M, 1.3B, 2.7B EleutherAI - code, gpt-neox huggingface
MobileBERT 2020 15.1M, 25.3M CMU, Google paper code huggingface
BART 2019 140M, 400M Meta paper code huggingface
DistilBERT 2019 66M HuggingFace paper code huggingface
T5 2019 60M, 220M, 770M, 3B, ... Google paper code huggingface
TinyBERT 2019 14.5M Huawei paper code huggingface
Megatron-LM 2019 336M, 1.3B, ... Nvidia paper code -

Infrastructure / Deployment of LLMs on Device

This section showcases frameworks and contributions for supporting LLM inference on mobile and edge devices.

Deployment Frameworks

  • llama.cpp: Inference of Meta's LLaMA model (and others) in pure C/C++. Supports various platforms and builds on top of ggml (now gguf format).
  • MLC-LLM: MLC LLM is a machine learning compiler and high-performance deployment engine for large language models. Supports various platforms and build on top of TVM.
  • PyTorch ExecuTorch: Solution for enabling on-device inference capabilities across mobile and edge devices including wearables, embedded devices and microcontrollers.
    • TorchChat: Codebase showcasing the ability to run large language models (LLMs) seamlessly across iOS and Android
  • Google MediaPipe: A suite of libraries and tools for you to quickly apply artificial intelligence (AI) and machine learning (ML) techniques in your applications. Support Android, iOS, Python and Web.
  • Apple MLX: MLX is an array framework for machine learning research on Apple silicon, brought to you by Apple machine learning research. Builds upon lazy evaluation and unified memory architecture.
  • Alibaba MNN: MNN supports inference and training of deep learning models and for inference and training on-device.
  • llama2.c (More educational, see here for android port)
  • tinygrad: Simple neural network framework from tinycorp and @geohot
  • TinyChatEngine: Targeted at Nvidia, Apple M1 and RPi, from Song Han's (MIT) group.

Papers

2024

  • [MobiCom'24] Mobile Foundation Model as Firmware (paper, code)
  • Merino: Entropy-driven Design for Generative Language Models on IoT Devicess (paper)
  • LLM as a System Service on Mobile Devices (paper)

2023

  • LLMCad: Fast and Scalable On-device Large Language Model Inference (paper)
  • EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models (paper)

2022

  • [IEEE Pervasive Computing] The Future of Consumer Edge-AI Computing (paper, talk)

Benchmarking LLMs on Device

This section focuses on measurements and benchmarking efforts for assessing LLM performance when deployed on device.

Papers

2024

  • [MobiCom'24] MELTing point: Mobile Evaluation of Language Transformers (paper, talk, code)

Applications

Papers

2024

  • Octopus v3: Technical Report for On-device Sub-billion Multimodal AI Agent (paper)
  • Octopus v2: On-device language model for super agent (paper)

2023

  • Towards an On-device Agent for Text Rewriting (paper)

Multimodal LLMs

This section refers to multimodal LLMs, which integrate vision or other modalities in their tasks.

Papers

2024

  • TinyLLaVA: A Framework of Small-scale Large Multimodal Models (paper, code)
  • MobileVLM V2: Faster and Stronger Baseline for Vision Language Model (paper, code)

2023

  • MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices (paper, code)

Surveys on Efficient LLMs

This section includes survey papers on LLM efficiency, a topic very much related to deploying in constrained devices.

Papers

2024

  • A Survey of Resource-efficient LLM and Multimodal Foundation Models (paper)

2023

  • Efficient Large Language Models: A Survey (paper, code)
  • Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems (paper)
  • A Survey on Model Compression for Large Language Models (paper)

Training LLMs on Device

This section refers to papers attempting to train/fine-tune LLMs on device, in a standalone or federated manner.

Papers

2023

  • [MobiCom'23] Federated Few-Shot Learning for Mobile NLP (paper, code)
  • FwdLLM: Efficient FedLLM using Forward Gradient (paper, code)
  • [Electronics'24] Forward Learning of Large Language Models by Consumer Devices (paper)
  • Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly (paper)
  • Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes (paper, code)

Mobile-Related Use-cases

This section includes paper that are mobile-related, but not necessarily run on device.

Papers

2024

  • Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs (paper)
  • Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception (paper, code)

2023

  • [NeurIPS'23] AndroidInTheWild: A Large-Scale Dataset For Android Device Control (paper, code)
  • GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation (paper, code)

Older

  • [ACL'20] Mapping Natural Language Instructions to Mobile UI Action Sequences (paper)

Industry Announcements

Related Awesome Repositories

If you want to read more about related topics, here are some tangential awesome repositories to visit:

Contribute

Contributions welcome! Read the contribution guidelines first.