yhwang-hub

yhwang-hub's Stars

deepseek-ai/DeepSeek-V3
Language:Python91.8k 714 44914.9k
deepseek-ai/DeepSeek-R1
85.9k 605 45711.1k
huggingface/open-r1
Fully open reproduction of DeepSeek-R1
Language:Python22.6k 304 2722k
kvcache-ai/ktransformers
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Language:Python12.6k 106 601842
Jiayi-Pan/TinyZero
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
Language:Python11.1k 125 751.4k
lyogavin/airllm
AirLLM 70B inference with single 4GB GPU
Language:Jupyter Notebook5.7k 133 189456
jingyaogong/minimind-v
🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM！🌏 Train a 26M-parameter VLM from scratch in just 1 hours!
Language:Python1.6k 24 40179
deepseek-ai/DeepSeek-MoE
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Language:Python1.6k 28 40272
alibaba/TinyNeuralNetwork
TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.
Language:Python809 20 150122
FireRedTeam/FireRedASR
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics recognition capability.
Language:Python723 15 3748
triton-inference-server/tutorials
This repository contains tutorials and examples for Triton Inference Server
Language:Python662 16 0108
zeux/calm
CUDA/Metal accelerated language model inference
Language:C522 12 023
deeperlearning/professional-cuda-c-programming
Language:Cuda422 12 5160
Maharshi-Pandya/cudacodes
Learnings and programs related to CUDA
Language:Cuda323 3 211
alibaba/ChatLearn
A flexible and efficient training framework for large-scale alignment tasks
Language:Python321 16 3024
godweiyang/GrabGPU
一款便捷的抢占显卡脚本
Language:Cuda308 5 937
andrewkchan/deepseek.cpp
CPU inference for the DeepSeek family of large language models in pure C++
Language:C++269 6 325
IrohXu/Awesome-Multimodal-LLM-Autonomous-Driving
[WACV 2024 Survey Paper] Multimodal Large Language Models for Autonomous Driving
267 9 111
leimao/ONNX-Runtime-Inference
ONNX Runtime Inference C++ Example
Language:C++231 7 1456
datawhalechina/llm-deploy
大模型/LLM推理和部署理论与实践
199 5 231
Tongkaio/CUDA_Kernel_Samples
CUDA 算子手撕与面试指南
Language:Cuda185 3 319
quic/ai-hub-apps
The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.
Language:Java157 6 4434
bytedance/decoupleQ
A quantization algorithm for LLM
Language:Cuda134 2 187
daquexian/faster-rwkv
Language:C++124 5 712
BBuf/tensorrt-llm-moe
Language:C++22
ViffyGwaanl/DeepSeek-Api-Test
Currently, there are many DeepSeek API providers on the market. Use DeepSeek Api Test to test which API performs the best
Language:Python191
DataXujing/DeepSeek-R1-Android
:fire: 安卓手机部署DeepSeek-R1 蒸馏的1.5B模型
Language:C++183
caibucai22/awesome-cuda
Awesome code, projects, books, etc. related to CUDA
151
Shibodd/cpp_scipy_rectangular_lsap
scipy.optimize.linear_sum_assignment edited for straightforward usage in C++ and Eigen
Language:C++2 1 01
shifan3/TensorRT-LLM-qwen2-vl
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++1 0 00