lessw2020
AI/PyTorch Partner Engineer - Meta AI (Facebook AI) Principal Software Engineer - Audere Software Architect - X10 Wireless Dev/PM - Microsoft
Seattle, WA USA
Pinned Repositories
Best-Deep-Learning-Optimizers
Collection of the latest, greatest, deep learning optimizers (for Pytorch) - CNN, NLP suitable
FAdam_PyTorch
an implementation of FAdam (Fisher Adam) in PyTorch
mish
Mish Deep Learning Activation Function for PyTorch / FastAI
Ranger-Deep-Learning-Optimizer
Ranger - a synergistic optimizer using RAdam (Rectified Adam), Gradient Centralization and LookAhead in one codebase
Ranger-Mish-ImageWoof-5
Repo to build on / reproduce the record breaking Ranger-Mish-SelfAttention setup on FastAI ImageWoof dataset 5 epochs
Ranger21
Ranger deep learning optimizer rewrite to use newest components
Ranger22
Testing various improvements to Ranger21 for 2022
res2net-plus
Res2Net architecture with improved stem and Mish activation function
training-detr
Unofficial Colab on how to train DETR, the intelligent object detector, with your own dataset. DETR = Detection Transformer
transformer_central
Various transformers for FSDP research
lessw2020's Repositories
lessw2020/Ranger22
Testing various improvements to Ranger21 for 2022
lessw2020/AdamW-Triton-PyTorch
Can AdamW written in Triton be as performant as fused CUDA impl?
lessw2020/MARS-AdamW-PyTorch
unofficial implementation of MARS-AdamW in PyTorch
lessw2020/expert-token-resonance-pytorch
Unofficial implementation of Expert Token Resonance in PyTorch (and later Triton)
lessw2020/torchchat
Run PyTorch LLMs locally on servers, desktop and mobile
lessw2020/adopt
Official Implementation of "ADOPT: Modified Adam Can Converge with Any β2 with the Optimal Rate"
lessw2020/nanoGPT_2d
2d tensor parallelism
lessw2020/cuSZp2
AD-AE version of SC24 cuSZp2 paper
lessw2020/rmsnorm_explore
Triton RMSNorm dev
lessw2020/adroitActivations
Compressing activations
lessw2020/bfcompress
lessw2020/cfx-research
lessw2020/DIMAT
lessw2020/disagg
Disaggregated serving system for LLM
lessw2020/duo-attention
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
lessw2020/etalon
LLM Serving Performance Evaluation Harness
lessw2020/fast.cu
Fastest kernels written from scratch
lessw2020/llm-baselines
nanoGPT-like codebase for LLM training
lessw2020/LoongServe
lessw2020/moe_inference
lessw2020/nino
Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks"
lessw2020/OpenDiloco
OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training
lessw2020/sarathi-serve
A low-latency & high-throughput serving engine for LLMs
lessw2020/sglang
SGLang is a fast serving framework for large language models and vision language models.
lessw2020/SORSA
SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models
lessw2020/SWIFT
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
lessw2020/torchtitan_oss
A native PyTorch Library for large model training
lessw2020/torchtune
A Native-PyTorch Library for LLM Fine-tuning
lessw2020/triton
Development repository for the Triton language and compiler
lessw2020/zipnn
A Lossless Compression Library for AI pipelines