Pinned Repositories
algoperf_results
algorithmic-efficiency
MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvements in both training algorithms and models.
benchmark
TorchBench is a collection of open source benchmarks used to evaluate PyTorch performance.
BufferOverflowLab
C_declaration_parser
cod-labs
Collections of my COD(Computer Organization and Design) lab code
ComputerArchitectureLab
This repository is used to release the Labs of Computer Architecture Course from USTC
easydist
Automated Parallelization System and Infrastructure for Multiple Ecosystems
hipress
USTC-CS-Courses-Resource
USTC计算机学院课程资源
mark14wu's Repositories
mark14wu/cod-labs
Collections of my COD(Computer Organization and Design) lab code
mark14wu/hipress
mark14wu/USTC-CS-Courses-Resource
USTC计算机学院课程资源
mark14wu/algoperf_results
mark14wu/algorithmic-efficiency
MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvements in both training algorithms and models.
mark14wu/benchmark
TorchBench is a collection of open source benchmarks used to evaluate PyTorch performance.
mark14wu/BufferOverflowLab
mark14wu/C_declaration_parser
mark14wu/ComputerArchitectureLab
This repository is used to release the Labs of Computer Architecture Course from USTC
mark14wu/easydist
Automated Parallelization System and Infrastructure for Multiple Ecosystems
mark14wu/hipress-examples
mark14wu/csapp_labs
mark14wu/DeepSpeedExamples
Example models using DeepSpeed
mark14wu/hipress-mxnet
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
mark14wu/hipress-overlapping-profiling-results
mark14wu/hipress-overlapping-profiling-scripts
mark14wu/JaxProfiler
profiler for jax
mark14wu/llama3
The official Meta Llama 3 GitHub site
mark14wu/nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
mark14wu/nccl-tests
NCCL Tests
mark14wu/OSH-2018.github.io
课程主页
mark14wu/Paddle
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
mark14wu/PyProf
A GPU performance profiling tool for PyTorch models
mark14wu/SE-2019-CloudMusic
mark14wu/SoftwareEngineeringHW
mark14wu/spack
A flexible package manager that supports multiple versions, configurations, platforms, and compilers.
mark14wu/torch-hipress-extension
mark14wu/Triton-Puzzles
Puzzles for learning Triton
mark14wu/triton_kernel_benchmarks
mark14wu/xla
A machine learning compiler for GPUs, CPUs, and ML accelerators