JingyangXiang's Stars
voxel51/fiftyone
Refine high-quality datasets and visual AI models
openai-translator/openai-translator
基于 ChatGPT API 的划词翻译浏览器插件和跨平台桌面端应用 - Browser extension and cross-platform desktop application for translation based on ChatGPT API.
huggingface/pytorch-image-models
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
S-LoRA/S-LoRA
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
01-ai/Yi
A series of large language models trained from scratch by developers @01-ai
OpenBMB/MiniCPM
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
locuslab/massive-activations
Code accompanying the paper "Massive Activations in Large Language Models"
Z-Siqi/Clash-for-Windows_Chinese
clash for windows汉化版. 提供clash for windows的汉化版, 汉化补丁及汉化版安装程序
JunHao-Zhu/MyPlotRepo
Some templates for visualizing data
cascremers/pdfdiff
Command-line tool to inspect the difference between (the text in) two PDF files
sagemath/sage
Main repository of SageMath
AniZpZ/AutoSmoothQuant
An easy-to-use package for implementing SmoothQuant for LLMs
bitsandbytes-foundation/bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
mit-han-lab/lmquant
mit-han-lab/qserve
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
facebookresearch/SpinQuant
Code repo for the paper "SpinQuant LLM quantization with learned rotations"
mit-han-lab/llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
NVlabs/MambaVision
Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone
mit-han-lab/streaming-llm
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
thu-nics/qllm-eval
Code Repository of Evaluating Quantized Large Language Models
ShiArthur03/ShiArthur03
hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
hiyouga/LLaMA-Factory
Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
Dao-AILab/fast-hadamard-transform
Fast Hadamard transform in CUDA, with a PyTorch interface
JingyangXiang/OvSW
Pytorch implementation of our paper OvSW: Overcoming Silent Weights for Accurate Binary Neural Networks accepted by ECCV 2024.
bytedance/decoupleQ
A quantization algorithm for LLM
OpenGVLab/OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.