roG0d's Stars
kyutai-labs/moshi
LMCache/LMCache
zml/zml
High performance AI inference stack. Built for production. @ziglang / @openxla / MLIR / @bazelbuild
microsoft/generative-ai-for-beginners
18 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/
langchain-ai/langchain
🦜🔗 Build context-aware reasoning applications
run-llama/llama_index
LlamaIndex is a data framework for your LLM applications
ollama/ollama
Get up and running with Llama 3.1, Mistral, Gemma 2, and other large language models.
ggerganov/llama.cpp
LLM inference in C/C++
pytorch/serve
Serve, optimize and scale PyTorch models in production
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
facebookresearch/segment-anything-2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
0xSh4dy/learning_llvm
huggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
AutoGPTQ/AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
casper-hansen/AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
mit-han-lab/llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
IST-DASLab/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
facebookresearch/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
dottxt-ai/outlines
Structured Text Generation
jeroenvlek/gpt-from-scratch-rs
Andrej Karpathy's Let's build GPT: from scratch video & notebook implemented in Rust + candle
apple/corenet
CoreNet: A library for training deep neural networks
ToluClassics/candle-tutorial
Tutorial for Porting PyTorch Transformer Models to Candle (Rust)
Syllo/nvtop
GPU & Accelerator process monitoring for AMD, Apple, Huawei, Intel, NVIDIA and Qualcomm
openai/transformer-debugger
karpathy/nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Byron/dua-cli
View disk space usage and delete unwanted data, fast.
sxyazi/yazi
💥 Blazing fast terminal file manager written in Rust, based on async I/O.
atuinsh/atuin
✨ Magical shell history
compiler-explorer/compiler-explorer
Run compilers interactively from your web browser and interact with the assembly