balusch's Stars
ggerganov/llama.cpp
LLM inference in C/C++
meta-llama/llama
Inference code for Llama models
lm-sys/FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
rasbt/LLMs-from-scratch
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
aria2/aria2
aria2 is a lightweight multi-protocol & multi-source, cross platform download utility operated in command-line. It supports HTTP/HTTPS, FTP, SFTP, BitTorrent and Metalink.
ray-project/ray
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
SerenityOS/serenity
The Serenity Operating System 🐞
rockerBOO/awesome-neovim
Collections of awesome neovim plugins.
catppuccin/catppuccin
😸 Soothing pastel theme for the high-spirited!
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
rigtorp/awesome-modern-cpp
A collection of resources on modern C++
srush/GPU-Puzzles
Solve puzzles. Learn CUDA.
microsoft/inshellisense
IDE style command line auto complete
cp-algorithms/cp-algorithms
Algorithm and data structure articles for https://cp-algorithms.com (based on http://e-maxx.ru)
boyter/scc
Sloc, Cloc and Code: scc is a very fast accurate code counter with complexity calculations and COCOMO estimates written in pure Go
facebookresearch/DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
folke/which-key.nvim
💥 Create key bindings that stick. WhichKey helps you remember your Neovim keymaps, by showing available keybindings in a popup as you type.
continue-revolution/sd-webui-segment-anything
Segment Anything for Stable Diffusion WebUI
DefTruth/Awesome-LLM-Inference
📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉
kvcache-ai/Mooncake
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
llvm/torch-mlir
The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
hao-ai-lab/LookaheadDecoding
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
skywind3000/emake
你见过的最简单的 GCC/CLANG 项目构建工具,定义式构建,比命令式更简单
dwmkerr/effective-shell
Text, samples and website for my 'Effective Shell' series.
gpu-mode/awesomeMLSys
An ML Systems Onboarding list
bloomberg/quantum
Powerful multi-threaded coroutine dispatcher and parallel execution engine
rmarx/holblocking-blogpost
Blogpost on Head-of-Line blocking from HTTP/1 to HTTP/3
edwardqin-creator/StableDiffusion-Model-Evaluation-Framework
This is a framework to evaluate your stable diffusion model