hustzxd
PhD of Institute of Computing Technology (ICT), University of Chinese Academy of Sciences (UCAS).
AMDBeijing
hustzxd's Stars
hiyouga/LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
lizongying/my-tv
我的电视 电视直播软件,安装即可使用
LlamaFamily/Llama-Chinese
Llama中文社区,Llama3在线体验和微调模型已开放,实时汇总最新Llama3学习资料,已将所有代码更新适配Llama3,构建最好的中文Llama大模型,完全开源可商用
triton-lang/triton
Development repository for the Triton language and compiler
Lightning-AI/litgpt
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
RUCAIBox/LLMSurvey
The official GitHub page for the survey paper "A Survey of Large Language Models".
srush/GPU-Puzzles
Solve puzzles. Learn CUDA.
huggingface/text-generation-inference
Large Language Model Text Generation Inference
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
microsoft/DeepSpeedExamples
Example models using DeepSpeed
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
mosaicml/composer
Supercharge Your Model Training
ahmetbersoz/chatgpt-prompts-for-academic-writing
This list of writing prompts covers a range of topics and tasks, including brainstorming research ideas, improving language and style, conducting literature reviews, and developing research plans.
mit-han-lab/llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
intel/neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
ashishpatel26/LLM-Finetuning
LLM Finetuning with peft
horseee/Awesome-Efficient-LLM
A curated list for Efficient Large Language Models
RahulSChand/gpu_poor
Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization
pytorch/PiPPy
Pipeline Parallelism for PyTorch
locuslab/wanda
A simple and effective LLM pruning approach.
xuhangc/ChatGPT-Academic-Prompt
Use ChatGPT for academic writing
FMInference/DejaVu
llm-efficiency-challenge/neurips_llm_efficiency_challenge
NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day
metebalci/pdftitle
a utility to extract the title from a PDF file
IST-DASLab/SparseFinetuning
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
CASIA-IVA-Lab/FLAP
[AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Models
Raincleared-Song/sparse_gpu_operator
GPU operators for sparse tensor operations
DS3Lab/Decentralized_FM_alpha
rhhc/EfficientPaperList
Paper about Pruning, Quantization, and Efficient-inference/training.
hustzxd/PaperListTemplate
This template makes it easy for you to manage papers.