This is the official PyTorch implementation of RepLKNet, from the following CVPR-2022 paper:
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs.
The paper is released on arXiv: https://arxiv.org/abs/2203.06717.
Update: training code released. testing
framework | link |
---|---|
MegEngine (official) | https://github.com/megvii-research/RepLKNet |
PyTorch (official) | https://github.com/DingXiaoH/RepLKNet-pytorch |
Tensorflow | https://github.com/shkarupa-alex/tfreplknet |
PaddlePaddle | re-implementations are welcomed |
... |
More re-implementations are welcomed.
We have released an example for PyTorch. Please check setup.py
and depthwise_conv2d_implicit_gemm.py
(a replacement of torch.nn.Conv2d) in https://github.com/MegEngine/cutlass/tree/master/examples/19_large_depthwise_conv2d_torch_extension.
- Clone
cutlass
(https://github.com/MegEngine/cutlass), enter the directory. cd examples/19_large_depthwise_conv2d_torch_extension
./setup.py install --user
. If you get errors, check yourCUDA_HOME
.- A quick check:
python depthwise_conv2d_implicit_gemm.py
- Add
WHERE_YOU_CLONED_CUTLASS/examples/19_large_depthwise_conv2d_torch_extension
into yourPYTHONPATH
so that you canfrom depthwise_conv2d_implicit_gemm import DepthWiseConv2dImplicitGEMM
anywhere. Then you may useDepthWiseConv2dImplicitGEMM
as a replacement ofnn.Conv2d
. export LARGE_KERNEL_CONV_IMPL=WHERE_YOU_CLONED_CUTLASS/examples/19_large_depthwise_conv2d_torch_extension
so that RepLKNet will use the efficient implementation. Or you may simply modify the related code (get_conv2d
) inreplknet.py
.
Our implementation mentioned in the paper has been integrated into MegEngine. The engine will automatically use it. If you would like to use it in other frameworks like Tensorflow, you may need to compile our released cuda sources (the *.cu
files in the above example should work with other frameworks) and use some tools to load them, just like cutlass
and torch.utils.cpp_extension
in the PyTorch example. Would be appreciated if you could share with us your experience.
You may refer to the MegEngine source code: https://github.com/MegEngine/MegEngine/tree/8a2e92bd6c5ac02807b27d174dce090ee391000b/dnn/src/cuda/conv_bias/chanwise. .
Pull requests (e.g., better or other implementations or implementations on other frameworks) are welcomed.
- Model code
- PyTorch pretrained models
- PyTorch large-kernel conv impl
- PyTorch training code
- PyTorch downstream models
- PyTorch downstream code
name | resolution | ImageNet-1K acc | #params | FLOPs | ImageNet-1K pretrained model |
---|---|---|---|---|---|
RepLKNet-31B | 224x224 | 83.5 | 79M | 15.3G | Google Drive, Baidu |
RepLKNet-31B | 384x384 | 84.8 | 79M | 45.1G | Google Drive, Baidu |
name | resolution | ImageNet-1K acc | #params | FLOPs | 22K pretrained model | 1K finetuned model |
---|---|---|---|---|---|---|
RepLKNet-31B | 224x224 | 85.2 | 79M | 15.3G | Google Drive, Baidu | Google Drive, Baidu |
RepLKNet-31B | 384x384 | 86.0 | 79M | 45.1G | - | Google Drive, Baidu |
RepLKNet-31L | 384x384 | 86.6 | 172M | 96.0G | Google Drive, Baidu | Google Drive, Baidu |
(uploading)
name | resolution | ImageNet-1K acc | #params | FLOPs | MegData-73M pretrained model | 1K finetuned model |
---|---|---|---|---|---|---|
RepLKNet-XL | 320x320 | 87.8 | 335M | 128.7G |
You may use multi-node training on a SLURM cluster with submitit. Please install:
pip install submitit
If you have limited GPU memory (e.g., 2080Ti), use --use_checkpoint True
to save GPU memory.
Single machine:
python -m torch.distributed.launch --nproc_per_node=8 main.py --model RepLKNet-31B --drop_path 0.5 --batch_size 64 --lr 4e-3 --update_freq 4 --model_ema true --model_ema_eval true --data_path /path/to/imagenet-1k --warmup_epochs 10 --epochs 300 --use_checkpoint True --output_dir your_training_dir
Four machines:
python run_with_submitit.py --nodes 4 --ngpus 8 --model RepLKNet-31B --drop_path 0.5 --batch_size 64 --lr 4e-3 --update_freq 4 --model_ema true --model_ema_eval true --data_path /path/to/imagenet-1k --warmup_epochs 10 --epochs 300 --use_checkpoint True --job_dir your_training_dir
Single machine:
The released PyTorch training script is based on the code of ConvNeXt, which was built using the timm library, DeiT and BEiT repositories.
This project is released under the MIT license. Please see the LICENSE file for more information.