Dinghow's Stars
ollama/ollama
Get up and running with Llama 3, Mistral, Gemma, and other large language models.
xai-org/grok-1
Grok open release
state-spaces/mamba
Mamba SSM architecture
PKU-YuanGroup/Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
vosen/ZLUDA
CUDA on AMD GPUs
facebookresearch/DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
google/gemma_pytorch
The official PyTorch implementation of Google's Gemma models
QwenLM/Qwen1.5
Qwen1.5 is the improved version of Qwen, the large language model series developed by Qwen team, Alibaba Cloud.
project-baize/baize-chatbot
Let ChatGPT teach your own chatbot in hours with a single GPU!
open-compass/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
dvmazur/mixtral-offloading
Run Mixtral-8x7B models in Colab or consumer desktops
QwenLM/Qwen-Agent
Agent framework and applications built upon Qwen2, featuring Function Calling, Code Interpreter, RAG, and Chrome extension.
DefTruth/Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
Vchitect/Latte
Latte: Latent Diffusion Transformer for Video Generation.
databricks/megablocks
NVIDIA/cuda-python
CUDA Python Low-level Bindings
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
SafeAILab/EAGLE
[ICML'24] EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
mit-han-lab/distrifuser
[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
bytedance/ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
SkunkworksAI/hydra-moe
chujiezheng/chat_templates
Chat Templates for 🤗 HuggingFace Large Language Models
InternLM/Agent-FLAN
[ACL2024 Findings] Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models
Q-Future/Q-Bench
①[ICLR2024 Spotlight] (GPT-4V/Gemini-Pro/Qwen-VL-Plus+16 OS MLLMs) A benchmark for multi-modality LLMs (MLLMs) on low-level vision and visual quality assessment.
efeslab/Atom
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
hemingkx/Spec-Bench
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
thunlp/MoEfication
stanford-futuredata/stk
xmed-lab/C2RV-CBCT
CVPR 2024, "C^2RV: Cross-Regional and Cross-View Learning for Sparse-View CBCT Reconstruction"
ruyue0001/Backdoor_DPR
Code for "Backdoor Attacks on Dense Passage Retrievers for Disseminating Misinformation"