Pinned Repositories
ai-matrix
To make it easy to benchmark AI accelerators
AIAccelerators-AE
AD/AE repo for the paper on AI Accelerator Evaluation
aime-team-pytorch-benchmarks
A benchmark framework for Pytorch
ao
PyTorch native quantization and sparsity for training and inference
applied-ai
Applied AI experiments and examples for PyTorch
attention-gym
Helpful tools and examples for working with flex-attention
BigCode-Megatron-LM
Ongoing research training transformer models at scale
ByteMLPerf
AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and versatility of software and hardware.
MITgcm
M.I.T General Circulation Model master code and documentation repository
onnx-dojo
ce107's Repositories
ce107/aime-team-pytorch-benchmarks
A benchmark framework for Pytorch
ce107/ao
PyTorch native quantization and sparsity for training and inference
ce107/applied-ai
Applied AI experiments and examples for PyTorch
ce107/attention-gym
Helpful tools and examples for working with flex-attention
ce107/BigCode-Megatron-LM
Ongoing research training transformer models at scale
ce107/ByteMLPerf
AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and versatility of software and hardware.
ce107/ConvBench
ce107/cookbook
Deep learning for dummies. All the practical details and useful utilities that go into working with real models.
ce107/CUDA-Learn-Notes
🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.
ce107/CUDA_Bench
CUDA GPU Benchmark
ce107/device-benchmarks
Benchmarks of different devices I have come across
ce107/dl_scaling
Scaling Deep learning on HPC systems
ce107/dlio_benchmark
An I/O benchmark for deep Learning applications
ce107/flashinfer
FlashInfer: Kernel Library for LLM Serving
ce107/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
ce107/hf-rocm-benchmark
A reproducible benchmark of Text Generation Inference and Transformers as of April 2024 on AMD Instinct MI250 and MI300
ce107/Lancet-Accelerating-MoE-Training-via-Whole-Graph-Computation-Communication-Overlapping
Official implementation for the paper Lancet: Accelerating Mixture-of-Experts Training via Whole Graph Computation-Communication Overlapping, published in MLSys'24.
ce107/LLM-Inference-Bench
LLM-Inference-Bench
ce107/llm-inference-benchmark
LLM Inference benchmark
ce107/milabench
Repository of machine learning benchmarks
ce107/ml-engineering
Machine Learning Engineering Open Book
ce107/ml_communications
ML communications benchmark
ce107/olcf-ai-training-series
OLCF AI Training Series Material
ce107/pytorch-distributed
A quickstart and benchmark for pytorch distributed training.
ce107/pytorch-gpu-benchmark
Using the famous cnn model in Pytorch, we run benchmarks on various gpu.
ce107/pytorch-micro-benchmarking
ce107/pytorch-transformers-wikitext2-benchmark
GPT2 fine-tuning benchmark using pytorch and huggingface transformers for comparing GPUs
ce107/SimAI-Bench
ALCF benchmarks for coupled simulation and AI workflows
ce107/transformers-benchmarks
real Transformer TeraFLOPS on various GPUs
ce107/tritonbench
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.