polarispw's Stars
TheAlgorithms/Python
All Algorithms implemented in Python
chatanywhere/GPT_API_free
Free ChatGPT API Key,免费ChatGPT API,支持GPT4 API(免费),ChatGPT国内可用免费转发API,直连无需代理。可以搭配ChatBox等软件/插件使用,极大降低接口使用成本。国内即可无限制畅快聊天。
state-spaces/mamba
Mamba SSM architecture
chen08209/FlClash
A multi-platform proxy client based on ClashMeta,simple and easy to use, open-source and ad-free.
liguodongiot/llm-action
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
SJTU-IPADS/PowerInfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
EleutherAI/lm-evaluation-harness
A framework for few-shot evaluation of language models.
mit-han-lab/streaming-llm
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
tickstep/aliyunpan
阿里云盘命令行客户端,支持JavaScript插件,支持同步备份功能。
deepseek-ai/DeepSeek-V2
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
horseee/Awesome-Efficient-LLM
A curated list for Efficient Large Language Models
facebookresearch/MobileLLM
MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.
AIoT-MLSys-Lab/Efficient-LLMs-Survey
[TMLR 2024] Efficient Large Language Models: A Survey
SafeAILab/EAGLE
Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)
microsoft/MInference
[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
locuslab/wanda
A simple and effective LLM pruning approach.
SqueezeAILab/SqueezeLLM
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
mazhengcn/suggested-notation-for-machine-learning
This introduces a suggestion of mathematical notation protocol for machine learning.
microsoft/TransformerCompression
For releasing code related to compression methods for transformers, accompanying our publications
catid/dora
Implementation of DoRA
Infini-AI-Lab/TriForce
[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
HanGuo97/lq-lora
AIoT-MLSys-Lab/SVD-LLM
Official Code for "SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression"
hahnyuan/ASVD4LLM
Activation-aware Singular Value Decomposition for Compressing Large Language Models
QC-LY/UniBind
The source code for "UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All"
Dousia/MetricPrompt
Code for KDD 2023 long paper: MetricPrompt: Prompting Model as a Relevance Metric for Few-Shot Text Classification
MichaelYang-lyx/LLM-Code-Benchmark
This is a benchmark and corresponding evaluation system for llm
Spidy20/AWS-Assistant-RAG-ChatBot
In this tutorial, we'll be creating a GPT-4 AWS Helper ChatBot utilizing Langchain, Lambda, API Gateway, and PostgreSQL PGVector hosted on an EC2 instance as our Vector database.