hongsunjang's Stars
rasbt/LLMs-from-scratch
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
tatsu-lab/stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
tloen/alpaca-lora
Instruct-tune LLaMA on consumer hardware
huggingface/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
meta-llama/llama-recipes
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama for WhatsApp & Messenger.
Lightning-AI/litgpt
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
bigscience-workshop/petals
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
FMInference/FlexLLMGen
Running large language models on a single GPU for throughput-oriented scenarios.
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
axboe/fio
Flexible I/O Tester
AutoGPTQ/AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
juncongmoo/pyllama
LLaMA: Open and Efficient Foundation Language Models
mit-han-lab/llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
IST-DASLab/gptq
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
microsoft/DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
openai/sparse_attention
Examples of using sparse attention, as in "Generating Long Sequences with Sparse Transformers"
mit-han-lab/torchquantum
A PyTorch-based framework for Quantum Classical Simulation, Quantum Machine Learning, Quantum Neural Networks, Parameterized Quantum Circuits with support for easy deployments on real quantum computers.
mlcommons/ck
Collective Knowledge (CK) and Collective Minds (CM): community-driven projects to learn how to run AI, ML and other emerging workloads in a more efficient and cost-effective way across diverse models, datasets, software and hardware using CK, CM/CMX and MLPerf automations
FMInference/H2O
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
FMInference/DejaVu
AIS-SNU/Smart-Infinity
[HPCA'24] Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real System
KimHanjung/VISAGE
[ECCV 2024] VISAGE: Video Instance Segmentation with Appearance-Guided Enhancement
zhaoshiji123/MTARD
The Code of ECCV2022:Enhanced Accuracy and Robustness via Multi-Teacher Adversarial Distillation
sanagno/adaptively_sparse_attention
SamsungLabs/Genie
Official Implementation of "Genie: Show Me the Data for Quantization" (CVPR 2023)
readwrite112/AGAThA
PPoPP24 AGAThA: Fast and Efficient GPU Acceleration of Guided Sequence Alignment for Long Read Mapping
Digilent/digilent-mig
hongsunjang/docker-pyenv-poetry
A Docker image : pyenv-poetry