jiangguochaoGG

Fudan UniversityShanghai

jiangguochaoGG's Stars

FMInference/H2O
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
Language:Python35732
Lightblues/AgentRE
Repo for for paper "AgentRE: An Agent-Based Framework for Navigating Complex Information Landscapes in Relation Extraction".
Language:Python262
YaoJiayi/CacheBlend
Language:Python253
QwenLM/Qwen2
Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.
Language:Shell7.4k458
FasterDecoding/TEAL
Language:Python753
Zefan-Cai/PyramidKV
The Official Implementation of PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling
Language:Jupyter Notebook48043
NVIDIA/TensorRT
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Language:C++10.6k2.1k
babalae/better-genshin-impact
📦BetterGI · 更好的原神 - 自动拾取 | 自动剧情 | 全自动钓鱼(AI) | 全自动七圣召唤 | 自动伐木 | 自动刷本 | 自动采集 - UI Automation Testing Tools For Genshin Impact
Language:C#4.7k299
microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Language:Python34.7k4k
triton-lang/triton
Development repository for the Triton language and compiler
Language:C++12.7k1.5k
ifromeast/cuda_learning
learning how CUDA works
Language:Cuda14919
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
Language:Cuda1.1k102
FasterDecoding/SnapKV
Language:Python1645
shadowpa0327/Palu
Code for Palu: Compressing KV-Cache with Low-Rank Projection
Language:Python372
mit-han-lab/Quest
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Language:Cuda1538
DefTruth/Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
2.5k162
microsoft/MInference
To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
Language:Python69525
LLMServe/DistServe
Disaggregated serving system for Large Language Models (LLMs).
Language:Jupyter Notebook27726
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
Language:Python5.1k356
DanXi-Dev/DanXi
[Windows / Mac / Linux / Android / iOS] Maybe the best all-rounded service app for Fudan University students. 可能是复旦学生最好的第三方校园服务APP。
Language:Dart22833
cuda-mode/lectures
Material for cuda-mode lectures
Language:Jupyter Notebook2.4k245
jiangguochaoGG/P-ICL
Language:Python4
Ding-Papa/Evaluating-filtering-coling24
code, models and prompts template for evaluation-filtering
Language:Python51
jiangguochaoGG/ToNER
Language:Python4
XuehaiPan/nvitop
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
Language:Python4.6k144
huggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Language:Python132k26.3k
stone-zeng/fduthesis
LaTeX thesis template for Fudan University
Language:TeX823207
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python26.7k3.9k
EleutherAI/lm-evaluation-harness
A framework for few-shot evaluation of language models.
Language:Python6.4k1.7k
modelscope/data-juicer
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据！
Language:Python2.5k161

jiangguochaoGG

jiangguochaoGG's Stars

FMInference/H2O

Lightblues/AgentRE

YaoJiayi/CacheBlend

QwenLM/Qwen2

FasterDecoding/TEAL

Zefan-Cai/PyramidKV

NVIDIA/TensorRT

babalae/better-genshin-impact

microsoft/DeepSpeed

triton-lang/triton

ifromeast/cuda_learning

flashinfer-ai/flashinfer

FasterDecoding/SnapKV

shadowpa0327/Palu

mit-han-lab/Quest

DefTruth/Awesome-LLM-Inference

microsoft/MInference

LLMServe/DistServe

sgl-project/sglang

DanXi-Dev/DanXi

cuda-mode/lectures

jiangguochaoGG/P-ICL

Ding-Papa/Evaluating-filtering-coling24

jiangguochaoGG/ToNER

XuehaiPan/nvitop

huggingface/transformers

stone-zeng/fduthesis

vllm-project/vllm

EleutherAI/lm-evaluation-harness

modelscope/data-juicer