wangtianxia-sjtu's Stars
QwenLM/Qwen2.5-Coder
Qwen2.5-Coder is the code version of Qwen2.5, the large language model series developed by Qwen team, Alibaba Cloud.
hijkzzz/Awesome-LLM-Strawberry
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.
jingyaogong/minimind
「大模型」3小时完全从0训练26M的小参数GPT,个人显卡即可推理训练!
ekondis/gpumembench
A GPU benchmark suite for assessing on-chip GPU memory bandwidth
intel/xFasterTransformer
NVIDIA/cccl
CUDA Core Compute Libraries
lllyasviel/Fooocus
Focus on prompting and generating
karpathy/build-nanogpt
Video+code lecture on building nanoGPT from scratch
xtensor-stack/xsimd
C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))
karpathy/LLM101n
LLM101n: Let's build a Storyteller
FlagOpen/FlagPerf
FlagPerf is an open-source software platform for benchmarking AI chips.
NVIDIA/gdrcopy
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
ProjectMitosisOS/dmerge-eurosys24-ae
Artifact evaluation repo for EuroSys'24.
smartnickit-project/smartnic-bench
A rust-based benchmark for BlueField SmartNICs.
boostorg/compute
A C++ GPU Computing Library for OpenCL
lipracer/cuda-rt-hook
LargeWorldModel/LWM
3b1b/manim
Animation engine for explanatory math videos
NVIDIA/cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
ccfddl/ccf-deadlines
⏰ Collaboratively track deadlines of conferences recommended by CCF (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~
vosen/ZLUDA
CUDA on non-NVIDIA GPUs
haoliuhl/ringattention
Transformers with Arbitrarily Large Context
andravin/wincnn
Winograd minimal convolution algorithm generator for convolutional neural networks.
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
NVIDIA/cuda-checkpoint
CUDA checkpoint and restore utility
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
SJTU-IPADS/Bamboo
Bamboo-7B Large Language Model
microsoft/DirectML
DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported hardware and drivers, including all DirectX 12-capable GPUs from vendors such as AMD, Intel, NVIDIA, and Qualcomm.
SJTU-IPADS/PowerInfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
halpz/re3