CUHKSZzxy

MS AI @ NTU (SG), BS CS @ CUHK (SZ)

Nanyang Technological UniversitySingapore

CUHKSZzxy's Stars

donnemartin/system-design-primer
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
Language:Python278k 6.6k 30546.5k
TheAlgorithms/Python
All Algorithms implemented in Python
Language:Python195k 5.9k 1.5k45.8k
meta-llama/llama-recipes
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama for WhatsApp & Messenger.
Language:Jupyter Notebook15.4k 194 3822.2k
BerriAI/litellm
Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
Language:Python14.3k 76 3.6k1.7k
skywind3000/awesome-cheatsheets
超级速查表 - 编程语言、框架和开发工具的速查表，单个文件包含一切你需要知道的东西 :zap:
Language:Shell11.6k 269 232.1k
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++8.8k 94 2k1k
risingwavelabs/risingwave
Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
Language:Rust7.1k 80 6.5k585
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
Language:Python6.3k 58 659541
InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Language:Python4.7k 38 1.5k431
wdndev/llm_interview_note
主要记录大语言大模型（LLMs）算法（应用）工程师相关的知识及面试题
Language:HTML3.9k 18 6446
lm-sys/RouteLLM
A framework for serving and evaluating LLM routers - save LLM costs without compromising quality!
Language:Python3.3k 26 52250
gpu-mode/lectures
Material for gpu-mode lectures
Language:Jupyter Notebook3.1k 43 8314
andrewekhalel/MLQuestions
Machine Learning and Computer Vision Engineer - Technical Interview Questions
3.1k 28 2509
stanford-crfm/helm
Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in HEIM (https://arxiv.org/abs/2311.04287) and vision-language models in VHELM (https://arxiv.org/abs/2410.07112).
Language:Python2k 36 1.1k254
km1994/LLMs_interview_notes
该仓库主要记录大模型（LLMs）算法工程师相关的面试题
1.5k 10 1106
XiangLi1999/PrefixTuning
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Language:Python896 8 50162
FMInference/DejaVu
Language:Python290 6 3537
October2001/Awesome-KV-Cache-Compression
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
156 3 13
jongwooko/distillm
Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)
Language:Python139 8 1017
schwartz-lab-NLP/TOVA
Token Omission Via Attention
Language:Python120 3 26
SNU-ARC/any-precision-llm
[ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
Language:Python83 3 63
Infini-AI-Lab/MagicDec
Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
Language:Python82 4 44
snu-comparch/InfiniGen
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)
Language:Python80 3 017
SUSTechBruce/LOOK-M
[EMNLP 2024 Findings🔥] Official implementation of "LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference"
Language:Python76 3 13
zhengzangw/Sequence-Scheduling
PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".
Language:Python74 1 415
andy-yang-1/DoubleSparse
16-fold memory access reduction with nearly no loss
Language:Python58 1 42
d-matrix-ai/keyformer-llm
Language:Python42 0 34
FFY0/AdaKV
The Official Implementation of Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference
Language:Python36 1 20
cat538/SKVQ
SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
Language:Python13 2 02
ThisisBillhe/ZipCache
[NeurIPS 2024] The official implementation of ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification
Language:Python11 1 00

CUHKSZzxy

CUHKSZzxy's Stars

donnemartin/system-design-primer

TheAlgorithms/Python

meta-llama/llama-recipes

BerriAI/litellm

skywind3000/awesome-cheatsheets

NVIDIA/TensorRT-LLM

risingwavelabs/risingwave

sgl-project/sglang

InternLM/lmdeploy

wdndev/llm_interview_note

lm-sys/RouteLLM

gpu-mode/lectures

andrewekhalel/MLQuestions

stanford-crfm/helm

km1994/LLMs_interview_notes

XiangLi1999/PrefixTuning

FMInference/DejaVu

October2001/Awesome-KV-Cache-Compression

jongwooko/distillm

schwartz-lab-NLP/TOVA

SNU-ARC/any-precision-llm

Infini-AI-Lab/MagicDec

snu-comparch/InfiniGen

SUSTechBruce/LOOK-M

zhengzangw/Sequence-Scheduling

andy-yang-1/DoubleSparse

d-matrix-ai/keyformer-llm

FFY0/AdaKV

cat538/SKVQ

ThisisBillhe/ZipCache