akaitsuki-ii's Stars
conda-forge/miniforge
A conda-forge distribution.
karpathy/llm.c
LLM training in simple, raw C/CUDA
karpathy/llama2.c
Inference Llama 2 in one file of pure C
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
liguodongiot/llm-action
本项目旨在分享大模型相关技术原理以及实战经验。
microsoft/LLMLingua
To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
BBuf/how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
microsoft/chunk-attention
triton-lang/triton
Development repository for the Triton language and compiler
ModelTC/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
microsoft/Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
huggingface/accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
huggingface/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
LargeWorldModel/LWM
allenai/OLMo
Modeling, training, eval, and inference code for OLMo
brexhq/prompt-engineering
Tips and tricks for working with Large Language Models like OpenAI's GPT-4.
AutoGPTQ/AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
dvmazur/mixtral-offloading
Run Mixtral-8x7B models in Colab or consumer desktops
federico-busato/Modern-CPP-Programming
Modern C++ Programming Course (C++03/11/14/17/20/23/26)
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
lyogavin/airllm
AirLLM 70B inference with single 4GB GPU
ByteByteGoHq/system-design-101
Explain complex systems using visuals and simple terms. Help you prepare for system design interviews.
ml-explore/mlx
MLX: An array framework for Apple silicon
THUDM/ChatGLM3
ChatGLM3 series: Open Bilingual Chat LLMs | 开源双语对话语言模型
noamgat/lm-format-enforcer
Enforce the output format (JSON Schema, Regex etc) of a language model
Significant-Gravitas/AutoGPT
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
cpacker/MemGPT
Letta (fka MemGPT) is a framework for creating stateful LLM services.
microsoft/autogen
A programming framework for agentic AI 🤖
langchain-ai/langchain
🦜🔗 Build context-aware reasoning applications