/awesome-local-ai

An awesome repository of local AI tools

If you tried Jan Desktop and liked it, please also check out the following awesome collection of open source and/or local AI tools and solutions.

Your contributions are always welcome!

Inference Engine

Repository Description Supported model formats CPU/GPU Support UI language Platform Type
llama.cpp - Inference of LLaMA model in pure C/C++ GGML/GGUF Both C/C++ Text-Gen
Nitro - 3MB inference engine embeddable in your apps. Uses Llamacpp and more Both Both Text-Gen
ollama - CLI and local server. Uses Llamacpp Both Both Text-Gen
koboldcpp - A simple one-file way to run various GGML models with KoboldAI's UI GGML Both C/C++ Text-Gen
LoLLMS - Lord of Large Language Models Web User Interface. Nearly ALL Both Python Text-Gen
ExLlama - A more memory-efficient rewrite of the HF transformers implementation of Llama AutoGPTQ/GPTQ GPU Python/C++ Text-Gen
vLLM - vLLM is a fast and easy-to-use library for LLM inference and serving. GGML/GGUF Both Python Text-Gen
LmDeploy - LMDeploy is a toolkit for compressing, deploying, and serving LLMs. Pytorch / Turbomind Both Python/C++ Text-Gen
Tensorrt-llm - Inference efficiently on NVIDIA GPUs Python / C++ runtimes Both Python/C++ Text-Gen
CTransformers - Python bindings for the Transformer models implemented in C/C++ using GGML library GGML/GPTQ Both C/C++ Text-Gen
llama-cpp-python - Python bindings for llama.cpp GGUF Both Python Text-Gen
llama2.rs - A fast llama2 decoder in pure Rust GPTQ CPU Rust Text-Gen
ExLlamaV2 - A fast inference library for running LLMs locally on modern consumer-class GPUs GPTQ/EXL2 GPU Python/C++ Text-Gen
LoRAX - Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs Safetensor / AWQ / GPTQ GPU Python/Rust Text-Gen

Inference UI

  • oobabooga - A Gradio web UI for Large Language Models
  • LM Studio - Discover, download, and run local LLMs.
  • LocalAI - LocalAI is a drop-in replacement REST API that’s compatible with OpenAI API specifications for local inferencing.
  • FireworksAI - Experience the world's fastest LLM inference platform deploy your own at no additional cost.
  • faradav - Chat with AI Characters Offline, Runs locally, Zero-configuration.
  • GPT4All - A free-to-use, locally running, privacy-aware chatbot
  • LLMFarm - llama and other large language models on iOS and MacOS offline using GGML library.
  • LlamaChat - LlamaChat allows you to chat with LLaMa, Alpaca and GPT4All models1 all running locally on your Mac.
  • LLM as a Chatbot Service - LLM as a Chatbot Service
  • FuLLMetalAi - Fullmetal.Ai is a distributed network of self-hosted Large Language Models (LLMs)
  • Automatic1111 - Stable Diffusion web UI
  • ComfyUI - A powerful and modular stable diffusion GUI with a graph/nodes interface.
  • petals - Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
  • ChatUI - Open source codebase powering the HuggingChat app

Platforms / full solutions

  • H2OAI - H2OGPT The fastest, most accurate AI Cloud Platform
  • BentoML - BentoML is a framework for building reliable, scalable, and cost-efficient AI applications.
  • Predibase - Serverless LoRA Fine-Tuning and Serving for LLMs.

Developer tools

  • Jan Framework - At its core, Jan is a cross-platform, local-first and AI native application framework that can be used to build anything.
  • Pinecone - Long-Term Memory for AI
  • PoplarML - PoplarML enables the deployment of production-ready, scalable ML systems with minimal engineering effort.
  • Datature - The All-in-One Platform to Build and Deploy Vision AI
  • One AI - MAKING GENERATIVE AI BUSINESS-READY
  • Gooey.AI - Create Your Own No Code AI Workflows
  • Mixo.io - AI website builder
  • Safurai - AI Code Assistant that saves you time in changing, optimizing, and searching code.
  • GitFluence - The AI-driven solution that helps you quickly find the right command. Get started with Git Command Generator today and save time.
  • Haystack - A framework for building NLP applications (e.g. agents, semantic search, question-answering) with language models.
  • LangChain - A framework for developing applications powered by language models.
  • gpt4all - A chatbot trained on a massive collection of clean assistant data including code, stories and dialogue.
  • LMQL - LMQL is a query language for large language models.
  • LlamaIndex - A data framework for building LLM applications over external data.
  • Phoenix - Open-source tool for ML observability that runs in your notebook environment, by Arize. Monitor and fine tune LLM, CV and tabular models.
  • trypromptly - Create AI Apps & Chatbots in Minutes
  • BentoML - BentoML is the platform for software engineers to build AI products.

Agents

  • SuperAGI - Opensource AGI Infrastructure
  • Auto-GPT - An experimental open-source attempt to make GPT-4 fully autonomous.
  • BabyAGI - Baby AGI is an autonomous AI agent developed using Python that operates through OpenAI and Pinecone APIs.
  • AgentGPT -Assemble, configure, and deploy autonomous AI Agents in your browser.
  • HyperWrite - HyperWrite helps you work smarter, faster, and with ease.
  • AI Agents - AI Agent that Power Up Your Productivity
  • AgentRunner.ai - Leverage the power of GPT-4 to create and train fully autonomous AI agents.
  • GPT Engineer - Specify what you want it to build, the AI asks for clarification, and then builds it.
  • GPT Prompt Engineer - Automated prompt engineering. It generates, tests, and ranks prompts to find the best ones.
  • MetaGPT - The Multi-Agent Framework: Given one line requirement, return PRD, design, tasks, repo.
  • Open Interpreter - Let language models run code. Have your agent write and execute code.

Training

  • FastChat - An open platform for training, serving, and evaluating large language models.
  • DeepSpeed - DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
  • BMTrain - Efficient Training for Big Models.
  • Alpa - Alpa is a system for training and serving large-scale neural networks.
  • Megatron-LM - Ongoing research training transformer models at scale
  • Ludwig - Low-code framework for building custom LLMs, neural networks, and other AI models.
  • Nanotron - Minimalistic large language model 3D-parallelism training

LLM Leaderboard

Research

  • Attention Is All You Need (2017): Presents the original transformer model. it helps with sequence-to-sequence tasks, such as machine translation. [Paper]
  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018): Helps with language modeling and prediction tasks. [Paper]
  • FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness (2022): Mechanism to improve transformers. [paper]
  • Improving Language Understanding by Generative Pre-Training (2019): Paper is authored by OpenAI on GPT. [paper]
  • Cramming: Training a Language Model on a Single GPU in One Day (2022): Paper focus on a way too increase the performance by using minimum computing power. [paper]
  • LaMDA: Language Models for Dialog Applications (2022): LaMDA is a family of Transformer-based neural language models by Google. [paper]
  • Training language models to follow instructions with human feedback (2022): Use human feedback to align LLMs. [paper]
  • TurboTransformers: An Efficient GPU Serving System For Transformer Models (PPoPP'21) [paper]
  • Fast Distributed Inference Serving for Large Language Models (arXiv'23) [paper]
  • An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs (arXiv'23) [paper]
  • Accelerating LLM Inference with Staged Speculative Decoding (arXiv'23) [paper]
  • ZeRO: Memory optimizations Toward Training Trillion Parameter Models (SC'20) [paper]
  • TensorGPT: Efficient Compression of the Embedding Layer in LLMs based on the Tensor-Train Decomposition 2023 [Paper]

Community