jiangguochaoGG's Stars
FMInference/H2O
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
Lightblues/AgentRE
Repo for for paper "AgentRE: An Agent-Based Framework for Navigating Complex Information Landscapes in Relation Extraction".
YaoJiayi/CacheBlend
QwenLM/Qwen2
Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.
FasterDecoding/TEAL
Zefan-Cai/PyramidKV
The Official Implementation of PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling
NVIDIA/TensorRT
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
babalae/better-genshin-impact
📦BetterGI · 更好的原神 - 自动拾取 | 自动剧情 | 全自动钓鱼(AI) | 全自动七圣召唤 | 自动伐木 | 自动刷本 | 自动采集 - UI Automation Testing Tools For Genshin Impact
microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
triton-lang/triton
Development repository for the Triton language and compiler
ifromeast/cuda_learning
learning how CUDA works
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
FasterDecoding/SnapKV
shadowpa0327/Palu
Code for Palu: Compressing KV-Cache with Low-Rank Projection
mit-han-lab/Quest
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
DefTruth/Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
microsoft/MInference
To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
LLMServe/DistServe
Disaggregated serving system for Large Language Models (LLMs).
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
DanXi-Dev/DanXi
[Windows / Mac / Linux / Android / iOS] Maybe the best all-rounded service app for Fudan University students. 可能是复旦学生最好的第三方校园服务APP。
cuda-mode/lectures
Material for cuda-mode lectures
jiangguochaoGG/P-ICL
Ding-Papa/Evaluating-filtering-coling24
code, models and prompts template for evaluation-filtering
jiangguochaoGG/ToNER
XuehaiPan/nvitop
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
huggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
stone-zeng/fduthesis
LaTeX thesis template for Fudan University
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
EleutherAI/lm-evaluation-harness
A framework for few-shot evaluation of language models.
modelscope/data-juicer
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!