hongsunjang

🥨👨‍💻👩‍💻☕

@AIS_SNU, SNU ECE Seoul, Republic of Korea

hongsunjang's Stars

rasbt/LLMs-from-scratch
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Language:Jupyter Notebook38k 406 1154.9k
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python33.8k 277 6k5.2k
tatsu-lab/stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
Language:Python29.7k 344 2714.1k
tloen/alpaca-lora
Instruct-tune LLaMA on consumer hardware
Language:Jupyter Notebook18.8k 155 4712.2k
huggingface/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Language:Python17k 111 1.1k1.7k
meta-llama/llama-recipes
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama for WhatsApp & Messenger.
Language:Jupyter Notebook15.9k 193 4052.3k
Lightning-AI/litgpt
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
Language:Python11.2k 100 8221.1k
bigscience-workshop/petals
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
Language:Python9.3k 97 206529
FMInference/FlexLLMGen
Running large language models on a single GPU for throughput-oriented scenarios.
Language:Python9.3k 112 83556
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++9.2k 98 2.2k1.1k
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
Language:C++6k 63 625895
axboe/fio
Flexible I/O Tester
Language:C5.4k 159 1k1.3k
AutoGPTQ/AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Language:Python4.6k 31 474493
juncongmoo/pyllama
LLaMA: Open and Efficient Foundation Language Models
Language:Python2.8k 35 93309
mit-han-lab/llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Language:Python2.7k 24 194223
IST-DASLab/gptq
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
Language:Python2k 29 50162
microsoft/DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Language:Python1.9k 42 315176
openai/sparse_attention
Examples of using sparse attention, as in "Generating Long Sequences with Sparse Transformers"
Language:Python1.5k 43 11191
mit-han-lab/torchquantum
A PyTorch-based framework for Quantum Classical Simulation, Quantum Machine Learning, Quantum Neural Networks, Parameterized Quantum Circuits with support for easy deployments on real quantum computers.
Language:Jupyter Notebook1.4k 29 121210
mlcommons/ck
Collective Knowledge (CK) and Collective Minds (CM): community-driven projects to learn how to run AI, ML and other emerging workloads in a more efficient and cost-effective way across diverse models, datasets, software and hardware using CK, CM/CMX and MLPerf automations
Language:Python613 53 498120
FMInference/H2O
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
Language:Python415 5 3951
FMInference/DejaVu
Language:Python309 6 3538
AIS-SNU/Smart-Infinity
[HPCA'24] Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real System
Language:Python38 1 04
KimHanjung/VISAGE
[ECCV 2024] VISAGE: Video Instance Segmentation with Appearance-Guided Enhancement
Language:Python30 4 11
zhaoshiji123/MTARD
The Code of ECCV2022:Enhanced Accuracy and Robustness via Multi-Teacher Adversarial Distillation
Language:Python29 2 53
sanagno/adaptively_sparse_attention
Language:Python18 1 50
SamsungLabs/Genie
Official Implementation of "Genie: Show Me the Data for Quantization" (CVPR 2023)
Language:Python17 5 05
readwrite112/AGAThA
PPoPP24 AGAThA: Fast and Efficient GPU Acceleration of Guided Sequence Alignment for Long Read Mapping
Language:C++16 1 41
Digilent/digilent-mig
6 4 03
hongsunjang/docker-pyenv-poetry
A Docker image : pyenv-poetry
Language:Shell1 0 00

hongsunjang

hongsunjang's Stars

rasbt/LLMs-from-scratch

vllm-project/vllm

tatsu-lab/stanford_alpaca

tloen/alpaca-lora

huggingface/peft

meta-llama/llama-recipes

Lightning-AI/litgpt

bigscience-workshop/petals

FMInference/FlexLLMGen

NVIDIA/TensorRT-LLM

NVIDIA/FasterTransformer

axboe/fio

AutoGPTQ/AutoGPTQ

juncongmoo/pyllama

mit-han-lab/llm-awq

IST-DASLab/gptq

microsoft/DeepSpeed-MII

openai/sparse_attention

mit-han-lab/torchquantum

mlcommons/ck

FMInference/H2O

FMInference/DejaVu

AIS-SNU/Smart-Infinity

KimHanjung/VISAGE

zhaoshiji123/MTARD

sanagno/adaptively_sparse_attention

SamsungLabs/Genie

readwrite112/AGAThA

Digilent/digilent-mig

hongsunjang/docker-pyenv-poetry