Pinned Repositories
Autoencoder-Clustering
Replication of "Auto-encoder Based Data Clustering" Song et al
CapsNet-Adversarial
Capsule networks can defend against adversarial attacks using reconstruction error
cifar10-airbench
94% on CIFAR-10 in 2.6 seconds 💨 96% in 27 seconds
Evaluate-CrossMax-Ensemble
An evaluation of the robust accuracy of the CrossMax Ensemble technique (Fort et al., 2024)
modded-nanogpt
NanoGPT (124M) in 5 minutes
Muon
Muon optimizer for neural networks: >30% extra sample efficiency, <3% wallclock overhead
REPAIR
Code release for REPAIR: REnormalizing Permuted Activations for Interpolation Repair
ResNet-PyTorch-CIFAR10
PyTorch implementation of residual networks trained on CIFAR-10 dataset (2017)
TriMap-PyTorch
Implementation of TriMap dimensionality reduction in PyTorch
tSNE-Animation
Hacking sklearn's t-SNE implementation to animate embedding process
KellerJordan's Repositories
KellerJordan/modded-nanogpt
NanoGPT (124M) in 5 minutes
KellerJordan/cifar10-airbench
94% on CIFAR-10 in 2.6 seconds 💨 96% in 27 seconds
KellerJordan/Muon
Muon optimizer for neural networks: >30% extra sample efficiency, <3% wallclock overhead
KellerJordan/REPAIR
Code release for REPAIR: REnormalizing Permuted Activations for Interpolation Repair
KellerJordan/hlb-CIFAR10
Train to 94% on CIFAR-10 in 4.4 seconds on a single A100
KellerJordan/top-sgd
Optimization algorithm which fits a ResNet to CIFAR-10 5x faster than SGD / Adam (with terrible generalization)
KellerJordan/cifar10-loader
Fast and easy to use CIFAR-10 dataloader
KellerJordan/Exponentiated-Gradient-PyTorch
EG plus/minus optimizer implemented in PyTorch
KellerJordan/CIFAR-cuda
welcome to the learning zone
KellerJordan/gpt-sandbox
KellerJordan/elastic-airbench
KellerJordan/negative-self-influence
neural networks don't minimize loss [caution: probably due to batchnorm]
KellerJordan/research-airbench
Variant of cifar10-airbench which removes several tricks. Ideal for research
KellerJordan/CIFAR10-isolated-rng
CIFAR-10 training script with separate seeds for model initialization, data ordering, and data augmentation
KellerJordan/Evaluate-CrossMax-Ensemble
An evaluation of the robust accuracy of the CrossMax Ensemble technique (Fort et al., 2024)
KellerJordan/pixelated-features-bugs
Code for training and evaluating the robustness of models using pixelated data
KellerJordan/BatchNorm-adaptation-behavior
The adaptation behavior of BatchNorm is no different than Norm-Free
KellerJordan/ffcv-cifar
Train a large number of CIFAR-10 models using FFCV
KellerJordan/ffcv-imagenet
Train ImageNet *fast* in 500 lines of code with FFCV -- forked for training only Resnet18s
KellerJordan/flash-attention
Fast and memory-efficient exact attention
KellerJordan/git-re-basin
Code release for "Git Re-Basin: Merging Models modulo Permutation Symmetries"
KellerJordan/jupyter-fork
KellerJordan/llm.c
LLM training in simple, raw C/CUDA
KellerJordan/Megatron-LM
Ongoing research training transformer models at scale
KellerJordan/MNIST-test
KellerJordan/nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
KellerJordan/robustness-featsnotbugs
A replication of "Adversarial Examples Are Not Bugs, They Are Features" https://arxiv.org/abs/1905.02175
KellerJordan/share-data
KellerJordan/trak
A fast, effective data attribution method for neural networks in PyTorch
KellerJordan/zf-compiler