retonym's Stars
ollama/ollama
Get up and running with Llama 3.3, Mistral, Gemma 2, and other large language models.
xai-org/grok-1
Grok open release
mlabonne/llm-course
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
microsoft/autogen
A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour
huihut/interview
📚 C/C++ 技术面试基础知识总结,包括语言、程序库、数据结构、算法、系统、网络、链接装载库等知识及面试经验、招聘、内推等信息。This repository is a summary of the basic knowledge of recruiting job seekers and beginners in the direction of C/C++ technology, including language, program library, data structure, algorithm, system, network, link loading library, interview experience, recruitment, recommendation, etc.
karpathy/llm.c
LLM training in simple, raw C/CUDA
unslothai/unsloth
Finetune Llama 3.3, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 70% less memory
meta-llama/llama-recipes
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama for WhatsApp & Messenger.
sympy/sympy
A computer algebra system written in pure Python
srush/GPU-Puzzles
Solve puzzles. Learn CUDA.
vosen/ZLUDA
CUDA on non-NVIDIA GPUs
brexhq/prompt-engineering
Tips and tricks for working with Large Language Models like OpenAI's GPT-4.
mit-han-lab/streaming-llm
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
bitsandbytes-foundation/bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
pytorch-labs/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
forthespada/CampusShame
互联网仍有记忆!那些曾经在校招过程中毁过口头offer、意向书、三方的公司!纵然人微言轻,也想尽绵薄之力!
ModelTC/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
kvcache-ai/Mooncake
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
DefTruth/CUDA-Learn-Notes
📚150+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
NVIDIA/cccl
CUDA Core Compute Libraries
facebookincubator/gloo
Collective communications library with various primitives for multi-machine training.
cuda-mode/awesomeMLSys
An ML Systems Onboarding list
BobMcDear/attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
FlagOpen/FlagGems
FlagGems is an operator library for large language models implemented in Triton Language.
mlops-discord/gpu-optimization-workshop
Slides, notes, and materials for the workshop
hkproj/pytorch-llama
LLaMA 2 implemented from scratch in PyTorch
FlagOpen/FlagAttention
A collection of memory efficient attention operators implemented in the Triton language.
ifromeast/cuda_learning
learning how CUDA works
EricPengShuai/Interview
CPP面试修炼