sjjeong94

I'm not efficient, but I like efficiency.

Seoul, South Korea

sjjeong94's Stars

cloneofsimo/minRF
Minimal implementation of scalable rectified flow transformers, based on SD3's approach
Language:Jupyter Notebook31817
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++5k851
facebookresearch/DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Language:Python5.8k508
NVIDIA/nccl
Optimized primitives for collective multi-GPU communication
Language:C++3k764
tspeterkim/paged-attention-minimal
a minimal cache manager for PagedAttention, on top of llama3.
Language:Python141
karpathy/LLM101n
LLM101n: Let's build a Storyteller
25.4k1.3k
karpathy/build-nanogpt
Video+code lecture on building nanoGPT from scratch
Language:Python3.1k387
naklecha/llama3-from-scratch
llama3 implementation one matrix multiplication at a time
Language:Jupyter Notebook11.4k866
tspeterkim/mixed-precision-from-scratch
Mixed precision training from scratch with Tensors and CUDA
Language:Python16
microsoft/autogen
A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/autogen-roadmap
Language:Jupyter Notebook28.7k4.2k
tspeterkim/flash-attention-minimal
Flash Attention in ~100 lines of CUDA (forward pass only)
Language:Cuda50542
microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Language:Python34k4k
meta-llama/llama3
The official Meta Llama 3 GitHub site
Language:Python24k2.6k
bitsandbytes-foundation/bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
Language:Python5.8k589
siboehm/SGEMM_CUDA
Fast CUDA matrix multiplication from scratch
Language:Cuda36445
apache/tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Language:Python11.5k3.4k
mlc-ai/mlc-llm
Universal LLM Deployment Engine with ML Compilation
Language:Python17.9k1.4k
karpathy/llm.c
LLM training in simple, raw C/CUDA
Language:Cuda22.3k2.5k
bayesian-optimization/BayesianOptimization
A Python implementation of global optimization with gaussian processes.
Language:Python7.7k1.5k
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++7.7k833
facebookresearch/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
Language:Python8.1k572
nicksypark/rope-triton
Language:Python10
AGI-Edgerunners/LLM-Agents-Papers
A repo lists papers related to LLM based agent
Language:Python86258
pytorch-labs/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Language:Python5.4k490
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
Language:Python12.6k1.1k
forhaoliu/ringattention
Transformers with Arbitrarily Large Context
Language:Python58443
NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
Language:Python1.7k268
unslothai/unsloth
Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
Language:Python13.1k853
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python23.6k3.4k
jcpeterson/openwebtext
Open clone of OpenAI's unreleased WebText dataset scraper. This version uses pushshift.io files instead of the API for speed.
Language:Python70379

sjjeong94

sjjeong94's Stars

cloneofsimo/minRF

NVIDIA/cutlass

facebookresearch/DiT

NVIDIA/nccl

tspeterkim/paged-attention-minimal

karpathy/LLM101n

karpathy/build-nanogpt

naklecha/llama3-from-scratch

tspeterkim/mixed-precision-from-scratch

microsoft/autogen

tspeterkim/flash-attention-minimal

microsoft/DeepSpeed

meta-llama/llama3

bitsandbytes-foundation/bitsandbytes

siboehm/SGEMM_CUDA

apache/tvm

mlc-ai/mlc-llm

karpathy/llm.c

bayesian-optimization/BayesianOptimization

NVIDIA/TensorRT-LLM

facebookresearch/xformers

nicksypark/rope-triton

AGI-Edgerunners/LLM-Agents-Papers

pytorch-labs/gpt-fast

Dao-AILab/flash-attention

forhaoliu/ringattention

NVIDIA/TransformerEngine

unslothai/unsloth

vllm-project/vllm

jcpeterson/openwebtext