Pinned Repositories
algoperf_results
algorithmic-efficiency
MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvements in both training algorithms and models.
BufferOverflowLab
C_declaration_parser
cod-labs
Collections of my COD(Computer Organization and Design) lab code
ComputerArchitectureLab
This repository is used to release the Labs of Computer Architecture Course from USTC
csapp-malloclab
CS:APP malloc lab: write a dynamic storage allocator
easydist
Automated Parallelization System and Infrastructure for Multiple Ecosystems
hipress
USTC-CS-Courses-Resource
USTC计算机学院课程资源
mark14wu's Repositories
mark14wu/cod-labs
Collections of my COD(Computer Organization and Design) lab code
mark14wu/hipress
mark14wu/USTC-CS-Courses-Resource
USTC计算机学院课程资源
mark14wu/algoperf_results
mark14wu/algorithmic-efficiency
MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvements in both training algorithms and models.
mark14wu/BufferOverflowLab
mark14wu/C_declaration_parser
mark14wu/ComputerArchitectureLab
This repository is used to release the Labs of Computer Architecture Course from USTC
mark14wu/csapp-malloclab
CS:APP malloc lab: write a dynamic storage allocator
mark14wu/easydist
Automated Parallelization System and Infrastructure for Multiple Ecosystems
mark14wu/hipress-examples
mark14wu/csapp_labs
mark14wu/DeepSpeedExamples
Example models using DeepSpeed
mark14wu/examples
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
mark14wu/hipress-mxnet
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
mark14wu/hipress-overlapping-profiling-results
mark14wu/hipress-overlapping-profiling-scripts
mark14wu/JaxProfiler
profiler for jax
mark14wu/llama3
The official Meta Llama 3 GitHub site
mark14wu/nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
mark14wu/nccl-tests
NCCL Tests
mark14wu/OSH-2018.github.io
课程主页
mark14wu/OSH2018-Project-Draft
Draft of OSH Project
mark14wu/Paddle
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
mark14wu/PyProf
A GPU performance profiling tool for PyTorch models
mark14wu/SE-2019-CloudMusic
mark14wu/SoftwareEngineeringHW
mark14wu/spack
A flexible package manager that supports multiple versions, configurations, platforms, and compilers.
mark14wu/torch-hipress-extension
mark14wu/xla
A machine learning compiler for GPUs, CPUs, and ML accelerators