shi510's Stars
astral-sh/uv
An extremely fast Python package and project manager, written in Rust.
apple/ml-4m
4M: Massively Multimodal Masked Modeling
yorukot/superfile
Pretty fancy and modern terminal file manager
mit-han-lab/smoothquant
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
mit-han-lab/qserve
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
philipturner/metal-flash-attention
FlashAttention (Metal Port)
tspeterkim/flash-attention-minimal
Flash Attention in ~100 lines of CUDA (forward pass only)
apple/ml-ferret
ml-explore/mlx
MLX: An array framework for Apple silicon
johnBuffer/VerletSFML-Multithread
Multithreaded deterministic minimalist Verlet solver
safevideo/autollm
Ship RAG based LLM web apps in seconds.
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
microsoft/SoM
Set-of-Mark Prompting for GPT-4V and LMMs
OpenInterpreter/open-interpreter
A natural language interface for computers
nuta/operating-system-in-1000-lines
Writing an OS in 1,000 lines.
apple/ml-fastvit
This repository contains the official implementation of the research paper, "FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization" ICCV 2023
logseq/logseq
A privacy-first, open-source platform for knowledge management and collaboration. Download link: http://github.com/logseq/logseq/releases. roadmap: http://trello.com/b/8txSM12G/roadmap
sger/RustBooks
List of Rust books
karpathy/llama2.c
Inference Llama 2 in one file of pure C
Nutlope/aicommits
A CLI that writes your git commit messages for you with AI
joonspk-research/generative_agents
Generative Agents: Interactive Simulacra of Human Behavior
riffusion/riffusion-hobby
Stable diffusion for real-time music generation
roboflow/inference
A fast, easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models.
facebookresearch/encodec
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
huggingface/candle
Minimalist ML framework for Rust
Janspiry/Palette-Image-to-Image-Diffusion-Models
Unofficial implementation of Palette: Image-to-Image Diffusion Models by Pytorch
pytorch/PiPPy
Pipeline Parallelism for PyTorch
jupyterlab/jupyter-ai
A generative AI extension for JupyterLab
meta-llama/llama-recipes
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama for WhatsApp & Messenger.
langchain-ai/langchain
🦜🔗 Build context-aware reasoning applications