Dinghow's Stars
ollama/ollama
Get up and running with Llama 3.3, Phi 4, Gemma 2, and other large language models.
xai-org/grok-1
Grok open release
state-spaces/mamba
Mamba SSM architecture
PKU-YuanGroup/Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
vosen/ZLUDA
CUDA on non-NVIDIA GPUs
QwenLM/Qwen2
Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.
facebookresearch/DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
google/gemma_pytorch
The official PyTorch implementation of Google's Gemma models
QwenLM/Qwen-Agent
Agent framework and applications built upon Qwen>=2.0, featuring Function Calling, Code Interpreter, RAG, and Chrome extension.
open-compass/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
DefTruth/Awesome-LLM-Inference
📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉
project-baize/baize-chatbot
Let ChatGPT teach your own chatbot in hours with a single GPU!
dvmazur/mixtral-offloading
Run Mixtral-8x7B models in Colab or consumer desktops
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
Vchitect/Latte
Latte: Latent Diffusion Transformer for Video Generation.
databricks/megablocks
NVIDIA/cuda-python
CUDA Python: Performance meets Productivity
SafeAILab/EAGLE
Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)
mit-han-lab/distrifuser
[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
chujiezheng/chat_templates
Chat Templates for 🤗 HuggingFace Large Language Models
bytedance/ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
SkunkworksAI/hydra-moe
InternLM/Agent-FLAN
[ACL2024 Findings] Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models
efeslab/Atom
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Q-Future/Q-Bench
â‘ [ICLR2024 Spotlight] (GPT-4V/Gemini-Pro/Qwen-VL-Plus+16 OS MLLMs) A benchmark for multi-modality LLMs (MLLMs) on low-level vision and visual quality assessment.
hemingkx/Spec-Bench
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
thunlp/MoEfication
stanford-futuredata/stk
xmed-lab/C2RV-CBCT
CVPR 2024, "C^2RV: Cross-Regional and Cross-View Learning for Sparse-View CBCT Reconstruction"
ruyue0001/Backdoor_DPR
Code for "Backdoor Attacks on Dense Passage Retrievers for Disseminating Misinformation"