Re-producing issue
youngwanLEE opened this issue ยท 12 comments
Hi,
For checking re-producibility, I tried to train the coat_lite_mini model(reported 79.1/94.5) and got 78.85/94.42 by using this command :
bash scripts/train.sh coat_lite_mini coat_lite_mini
with the default settings such as the batch size of 256 and using 8 GPUs (TITAN RTX).
Is such a small difference (79.1 vs. 78.9) negligible?
My environment :
sys.platform linux
Python 3.7.9 (default, Aug 31 2020, 12:42:55) [GCC 7.3.0]
numpy 1.19.2
Compiler GCC 7.5
CUDA compiler CUDA 10.1
detectron2 arch flags 7.5
DETECTRON2_ENV_MODULE
PyTorch 1.7.0
PyTorch debug build True
GPU available True
GPU 0,1,2,3,4,5,6,7 TITAN RTX (arch=7.5)
CUDA_HOME /usr/local/cuda-10.1
Pillow 8.0.1
torchvision 0.8.0
torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5
fvcore 0.1.2.post20201218
cv2 Not found
PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 10.2
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
- CuDNN 7.6.5
- Magma 2.5.2
- Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
Hi @youngwanLEE, thank you for your experiment for reproducibility checking! I think 0.1~0.2% error is acceptable (actually our reported results are rounded from 79.090%), since in some of our trials we also experience similar errors. Thus, I think your results are reasonable.
@xwjabc Thanks for your quick reply :)
@xwjabc Thanks for your quick reply :)
@youngwanLEE Besides, sometimes we found that if you use 16 GPUs instead of 8 GPUs (other settings are the same) it can slightly improve the performance (around 0.1% improvement), but we have not validated it too much. You may give it a try :)
@xwjabc,
On the other hand, Did you decrease the batch size from 256 to 128?
When I tried to train the coat_mini model with default settings (batch size of 256) on an 8 GPU machine with V100(32GB), the out-of-memory problem occurred.
So I had no choice but to reduce the batch size from 256 to 128.
@youngwanLEE Yes, we also reduce the batch size per GPU and use more GPUs for coat_mini model (since we use 24GB GPUs such as TITAN RTX or RTX 3090 for training).
@xwjabc Hi,
I have another question.
How to compute the computational cost (a.k.a. FLOPs)?
@xwjabc Hi,
I have another question.How to compute the computational cost (a.k.a. FLOPs)?
Hi @youngwanLEE, for the arXiv paper, we used a modified version of the FLOPs calculation script from mmcv, following PVT (the calculation for attention part is modified accordingly). Will update the script to repo soon!
@xwjabc oh, good news !!
Thanks :)
@xwjabc Hi,
I want to share the re-produced result of Coat-Mini: 81.494 / 95.568 which is higher than your report:).
Cool !!
My environment :
pytorch: 1.7
torchvision: 0.8.1
GPUs : RTX8000 x 8
batch-size-per-gpu : 256
Training time 6 days, 15:51:55
@youngwanLEE Cool! Currently we are still exploring ways to improve the efficiency of CoaT models. We hope that we can obtain faster and better models in the end.
@xwjabc Hi,
I have another question.
How to compute the computational cost (a.k.a. FLOPs)?Hi @youngwanLEE, for the arXiv paper, we used a modified version of the FLOPs calculation script from mmcv, following PVT (the calculation for attention part is modified accordingly). Will update the script to repo soon!
Hi, @xwjabc
I want to know when the FLOPs calculation script is open.
Thanks in advance :)
@youngwanLEE Sorry for the late reply! Previously we were busy preparing for the paper rebuttal. We will release the FLOPs calculation script soon as well as some larger models (CoaT Small (~20M) and CoaT-Lite Medium (~40M)). Thanks!