CUHKSZzxy's Stars
donnemartin/system-design-primer
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
TheAlgorithms/Python
All Algorithms implemented in Python
meta-llama/llama-recipes
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama for WhatsApp & Messenger.
BerriAI/litellm
Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
skywind3000/awesome-cheatsheets
超级速查表 - 编程语言、框架和开发工具的速查表,单个文件包含一切你需要知道的东西 :zap:
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
risingwavelabs/risingwave
Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
wdndev/llm_interview_note
主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题
lm-sys/RouteLLM
A framework for serving and evaluating LLM routers - save LLM costs without compromising quality!
gpu-mode/lectures
Material for gpu-mode lectures
andrewekhalel/MLQuestions
Machine Learning and Computer Vision Engineer - Technical Interview Questions
stanford-crfm/helm
Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in HEIM (https://arxiv.org/abs/2311.04287) and vision-language models in VHELM (https://arxiv.org/abs/2410.07112).
km1994/LLMs_interview_notes
该仓库主要记录 大模型(LLMs) 算法工程师相关的面试题
XiangLi1999/PrefixTuning
Prefix-Tuning: Optimizing Continuous Prompts for Generation
FMInference/DejaVu
October2001/Awesome-KV-Cache-Compression
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
jongwooko/distillm
Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)
schwartz-lab-NLP/TOVA
Token Omission Via Attention
SNU-ARC/any-precision-llm
[ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
Infini-AI-Lab/MagicDec
Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
snu-comparch/InfiniGen
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)
SUSTechBruce/LOOK-M
[EMNLP 2024 Findings🔥] Official implementation of "LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference"
zhengzangw/Sequence-Scheduling
PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".
andy-yang-1/DoubleSparse
16-fold memory access reduction with nearly no loss
d-matrix-ai/keyformer-llm
FFY0/AdaKV
The Official Implementation of Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference
cat538/SKVQ
SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
ThisisBillhe/ZipCache
[NeurIPS 2024] The official implementation of ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification