ziyang-arch's Stars
meta-llama/llama
Inference code for Llama models
lm-sys/FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
hiyouga/LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
pytorch/examples
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
pyg-team/pytorch_geometric
Graph Neural Network Library for PyTorch
triton-lang/triton
Development repository for the Triton language and compiler
AI4Finance-Foundation/FinRL
FinRL: Financial Reinforcement Learning. 🔥
OpenMathLib/OpenBLAS
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
ROCm/HIP
HIP: C++ Heterogeneous-Compute Interface for Portability
NVIDIA/nccl
Optimized primitives for collective multi-GPU communication
juncongmoo/pyllama
LLaMA: Open and Efficient Foundation Language Models
HuaizhengZhang/Awesome-System-for-Machine-Learning
A curated list of research in machine learning systems (MLSys). Paper notes are also provided.
NiuTrans/ABigSurvey
A collection of 1000+ survey papers on Natural Language Processing (NLP) and Machine Learning (ML).
NVIDIA-developer-blog/code-samples
Source code examples from the Parallel Forall Blog
pytorch/torchdynamo
A Python-level JIT compiler designed to make unmodified PyTorch programs faster.
matrix-profile-foundation/matrixprofile
A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.
microsoft/msccl
Microsoft Collective Communication Library
Zilize/DrawCV
Awesome CV template based on Draw.io. 基于 Draw.io 绘制的简历模板
KernelTuner/kernel_tuner
Kernel Tuner
yzhaiustc/Optimizing-SGEMM-on-NVIDIA-Turing-GPUs
Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
astra-sim/astra-sim
ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale
ROCm/rccl
ROCm Communication Collectives Library (RCCL)
msr-fiddle/philly-traces
tukl-msd/DRAMPower
Fast and accurate DRAM power and energy estimation tool
yzhaiustc/Optimizing-DGEMM-on-Intel-CPUs-with-AVX512F
Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.
microsoft/msccl-tools
Synthesizer for optimal collective communication algorithms
parasailteam/coconet
AMDResearch/DAGEE
Directed Acyclic Graph Execution Engine (DAGEE) is a C++ library that enables programmers to express computation and data movement, as task graphs that are scheduled concurrently and asynchronously on both CPUs and GPUs.
ziyang-arch/Hybrid-Cooling-For-Data-Center