llama-cpp

There are 189 repositories under llama-cpp topic.

  • getumbrel/llama-gpt

    A self-hosted, offline, ChatGPT-like chatbot. Powered by Llama 2. 100% private, with no data leaving your device. New: Code Llama support!

    Language:TypeScript11k80129711
  • SciSharp/LLamaSharp

    A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.

    Language:C#3.4k64446476
  • maid

    Mobile-Artificial-Intelligence/maid

    Maid is a cross-platform Flutter app for interfacing with GGUF / llama.cpp models locally, and with Ollama and OpenAI models remotely.

    Language:Dart2.2k34189225
  • node-llama-cpp

    withcatai/node-llama-cpp

    Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema on the model output on the generation level

    Language:TypeScript1.7k20130155
  • gotzmann/llama.go

    llama.go is like llama.cpp in pure Golang!

    Language:Go1.4k302268
  • undreamai/LLMUnity

    Create characters in Unity with LLMs!

    Language:C#1.3k22145144
  • mybigday/llama.rn

    React Native binding of llama.cpp

    Language:C711107975
  • docker/compose-for-agents

    Build and run AI agents using Docker Compose. A collection of ready-to-use examples for orchestrating open-source LLMs, tools, and agent runtimes.

    Language:TypeScript700229288
  • the-crypt-keeper/can-ai-code

    Self-evaluating interview for AI coders

    Language:Python5961125635
  • withcatai/catai

    Run AI ✨ assistant locally! with simple API for Node.js 🚀

    Language:TypeScript47994540
  • mdrokz/rust-llama.cpp

    LLama.cpp rust bindings

    Language:Rust40682250
  • dipampaul17/KVSplit

    Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit keys & 4-bit values, reducing memory by 59% with <1% quality loss. Includes benchmarking, visualization, and one-command setup. Optimized for M1/M2/M3 Macs with Metal support.

    Language:Python36012
  • jlonge4/local_llama

    This repo is to showcase how you can run a model locally and offline, free of OpenAI dependencies.

    Language:Python29391446
  • gpustack/gguf-parser-go

    Review/Check GGUF files and estimate the memory usage and maximum tokens per second.

    Language:Go2148822
  • lucasjinreal/Crane

    A Pure Rust based LLM (Any LLM based MLLM such as Spark-TTS) Inference Engine, powering by Candle framework.

    Language:Rust19061115
  • ptsochantaris/emeltal

    Local ML voice chat using high-end models.

    Language:C++1785113
  • phronmophobic/llama.clj

    Run LLMs locally. A clojure wrapper for llama.cpp.

    Language:Clojure1667149
  • gotzmann/booster

    Booster - open accelerator for LLM models. Better inference and debugging for AI hackers

    Language:C++1637610
  • shady.ai

    BrutalCoding/shady.ai

    Making offline AI models accessible to all types of edge devices.

    Language:Dart142131018
  • nuance1979/llama-server

    LLaMA Server combines the power of LLaMA C++ with the beauty of Chatbot UI.

    Language:Python1303814
  • 1038lab/ComfyUI-MiniCPM

    A custom ComfyUI node for MiniCPM vision-language models, supporting v4, v4.5, and v4 GGUF formats, enabling high-quality image captioning and visual analysis.

    Language:Python1261910
  • nrl-ai/CustomChar

    Your customized AI assistant - Personal assistants on any hardware! With llama.cpp, whisper.cpp, ggml, LLaMA-v2.

    Language:C++1185213
  • thushan/olla

    High-performance lightweight proxy and load balancer for LLM infrastructure. Intelligent routing, automatic failover and unified model discovery across local and remote inference backends.

    Language:Go11011
  • R3gm/InsightSolver-Colab

    InsightSolver: Colab notebooks for exploring and solving operational issues using deep learning, machine learning, and related models.

    Language:Jupyter Notebook1015131
  • vtuber-plan/langport

    Langport is a language model inference service

    Language:Python9541113
  • robiwan303/babyagi

    BabyAGI-🦙: Enhanced for Llama models (running 100% local) and persistent memory, with smart internet search based on BabyCatAGI and document embedding in langchain based on privateGPT

    Language:Python90708
  • OpenCSGs/llm-inference

    llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.

    Language:Python88123416
  • Abhi5h3k/PrivateDocBot

    📚 Local PDF-Integrated Chat Bot: Secure Conversations and Document Assistance with LLM-Powered Privacy

    Language:Python872521
  • greynewell/musegpt

    Local LLMs in your DAW!

    Language:C++823244
  • ImpAI

    rbourgeat/ImpAI

    😈 ImpAI is an advanced role play app using large language and diffusion models.

    Language:JavaScript63354
  • ystemsrx/code-atlas

    A C++ implementation of Open Interpreter. / Open Interpreter 的 C++ 实现

    Language:C++634116
  • fboulnois/llama-cpp-docker

    Run llama.cpp in a GPU accelerated Docker container

    Language:Dockerfile550214
  • hyparam/hyllama

    llama.cpp gguf file parser for javascript

    Language:JavaScript50323
  • iacopPBK/llama.cpp-gfx906

    llama.cpp-gfx906

    Language:C++49234
  • lordmathis/llamactl

    Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard.

    Language:Go490
  • blueraai/universal-intelligence

    ◉ Universal Intelligence: AI made simple.

    Language:Python453166