byungsoo-oh

CS Ph.D. Student @ Cornell University

Cornell UniversityIthaca, NY

byungsoo-oh's Stars

Dao-AILab/flash-attention
Fast and memory-efficient exact attention
Language:Python15k 123 1.2k1.4k
microsoft/LoRA
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
Language:Python11.1k 70 108698
vosen/ZLUDA
CUDA on non-NVIDIA GPUs
Language:Rust10.3k 141 183671
SJTU-IPADS/PowerInfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
Language:C++8k 78 172418
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
Language:Python7.2k 64 873690
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++6k 111 1.2k1k
mosaicml/composer
Supercharge Your Model Training
Language:Python5.2k 49 552429
mosaicml/llm-foundry
LLM training code for Databricks foundation models
Language:Python4.1k 48 387536
cybertronai/gradient-checkpointing
Make huge neural nets fit in memory
Language:Python2.7k 82 42272
google-deepmind/gemma
Open weights LLM from Google DeepMind.
Language:Python2.6k 33 35328
databricks/dbrx
Code examples and resources for DBRX, a large language model developed by Databricks
Language:Python2.5k 42 24239
HazyResearch/ThunderKittens
Tile primitives for speedy kernels
Language:Cuda1.9k 34 3493
NUS-HPC-AI-Lab/OpenDiT
OpenDiT: An Easy, Fast and Memory-Efficient System for DiT Training and Inference
Language:Python1.4k 23 6093
RahulSChand/gpu_poor
Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization
Language:JavaScript1.2k 7 1662
mosaicml/streaming
A Data Streaming Library for Efficient Neural Network Training
Language:Python1.2k 25 185148
myshell-ai/JetMoE
Reaching LLaMA2 Performance with 0.1M Dollars
Language:Python966 8 1080
pjlab-sys4nlp/llama-moe
⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)
Language:Python908 8 2249
volcengine/veScale
A PyTorch Native LLM Training Framework
Language:Python690 33 1836
alibaba/Megatron-LLaMA
Best practice for training LLaMA models in Megatron-LM
Language:Python638 7 6455
forhaoliu/ringattention
Transformers with Arbitrarily Large Context
Language:Python625 6 1648
LLMServe/DistServe
Disaggregated serving system for Large Language Models (LLMs).
Language:Jupyter Notebook435 5 4550
sail-sg/zero-bubble-pipeline-parallelism
Zero Bubble Pipeline Parallelism
Language:Python307 7 2916
microsoft/mscclpp
MSCCL++: A GPU-driven communication stack for scalable AI applications
Language:C++281 19 10543
UpstageAI/evalverse
The Universe of Evaluation. All about the evaluation for LLMs.
Language:Python221 8 1025
efeslab/fiddler
Fast Inference of MoE Models with CPU-GPU Orchestration
Language:Python178 8 216
microsoft/ParrotServe
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
Language:Python130 5 49
TorchMoE/MoE-Infinity
PyTorch library for cost-effective, fast and easy serving of MoE models.
Language:Python112 7 189
parasailteam/coconet
Language:HTML72 3 911
mental2008/awesome-papers
Here are my personal paper reading notes (including cloud computing, resource management, systems, machine learning, deep learning, and other interesting stuffs).
60 8 833
raymin0223/fast_robust_early_exit
Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)
Language:Python56 2 98

byungsoo-oh

byungsoo-oh's Stars

Dao-AILab/flash-attention

microsoft/LoRA

vosen/ZLUDA

SJTU-IPADS/PowerInfer

sgl-project/sglang

NVIDIA/cutlass

mosaicml/composer

mosaicml/llm-foundry

cybertronai/gradient-checkpointing

google-deepmind/gemma

databricks/dbrx

HazyResearch/ThunderKittens

NUS-HPC-AI-Lab/OpenDiT

RahulSChand/gpu_poor

mosaicml/streaming

myshell-ai/JetMoE

pjlab-sys4nlp/llama-moe

volcengine/veScale

alibaba/Megatron-LLaMA

forhaoliu/ringattention

LLMServe/DistServe

sail-sg/zero-bubble-pipeline-parallelism

microsoft/mscclpp

UpstageAI/evalverse

efeslab/fiddler

microsoft/ParrotServe

TorchMoE/MoE-Infinity

parasailteam/coconet

mental2008/awesome-papers

raymin0223/fast_robust_early_exit