/ViT-Adapter-DINOv2-panoptic

[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions

Primary LanguagePythonApache License 2.0Apache-2.0

install CUDA and set CUDA_HOME environment variable conda create -n openmmlab python
conda activate openmmlab
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124 (change to right cuda version)
pip install mmcv-full scipy timm mmdet==2.25.0
pip install git+https://github.com/cocodataset/panopticapi.git
cd detection/ops
sh make.sh

put coco in detection/data in this structure:

detection
├── data
│   ├── coco
│   │   ├── annotations
|   |   |   ├── instances_train2017.json
|   |   |   ├── instances_val2017.json
│   │   │   ├── panoptic_train2017.json
│   │   │   ├── panoptic_train2017
│   │   │   ├── panoptic_val2017.json
│   │   │   ├── panoptic_val2017
│   │   ├── train2017
│   │   ├── val2017
│   │   ├── test2017

modify /home/tkerssies/miniconda3/envs/openmmlab/lib/python3.12/site-packages/mmdet/init.py (change to right path)

# assert (mmcv_version >= digit_version(mmcv_minimum_version)
#         and mmcv_version <= digit_version(mmcv_maximum_version)), \
#     f'MMCV=={mmcv.__version__} is used but incompatible. ' \
#     f'Please install mmcv>={mmcv_minimum_version}, <={mmcv_maximum_version}.'

modify /home/tkerssies/miniconda3/envs/openmmlab/lib/python3.12/site-packages/torch/nn/parallel/_functions.py (change to right path)

def _get_stream(device: torch.device):
    """Get a background stream for copying between CPU and target device."""
    global _streams
    device_mod = getattr(torch, "cuda", None)
    if device_mod is None:
        return None
    if _streams is None:
        _streams = [None] * device_mod.device_count()
    if _streams[device] is None:
        _streams[device] = device_mod.Stream(device)
    return _streams[device]

modify /home/tkerssies/miniconda3/envs/openmmlab/lib/python3.12/site-packages/mmcv/parallel/distributed.py (change to right path)

module_to_run = self.module

use sbatch slurm_train.sh to train DINOv2reg4-L for panoptic segmentation on 16 h100 GPUs with 4 gpus per node on Snellius