PoolFormer, but different channels have different pooling kernel sizes, and there's no downsampling.
I develop inside of the January 2024 edition of the Nvidia PyTorch Docker image.
docker run -it -d --gpus all -v /workspace:/workspace nvcr.io/nvidia/pytorch:24.01-py3
Implementations are in src
, training script is in scripts
along with a few sanity-checks. The training script expects CIFAR-10/100 to be in a folder called data
.