CVPR 2023 1st foundation model challenge-Track2

Leaderboard A: 3rd Place Solution

HARDWARE & SOFTWARE

Ubuntu 22.04

CPU: 13900k

GPU: 1 * 4090, 24G

Python: 3.9.13

Pytorch: 2.0.0+cu118

Arch

|-- CVPR/
|   |-- models/
|      |-- ...
|   |-- utils/
|      |-- ...
|   |-- ...
|-- data/
|   |-- car_text/
|      |-- ...
|   |-- train/
|      |-- train_images/
|         |-- ...
|      |-- train_label.txt
|   |-- test/
|      |-- test_images/
|         |-- ...
|      |-- test_label.txt
|   |-- val/
|      |-- val_images/
|         |-- ...
|      |-- val_label.txt
|-- pretrained_weights/
|   |-- ...
|-- ...

Pipeline

Download data from the official link
Run data_analyzing.ipynb to explore the dataset and do caption->label mapping, dataset merging, etc.
Run Data_preparing.ipynb to split dataset with stratified Kfold for local validation.
Train 3 models for 3 sub-cl-tasks of car.
Train 12 models for 12 sub-cl-tasks of pedestrian.
Train a Pedestrian-Car general classification Model.
Split pedestrian and car of test data with model trained in step 6.
Inference of sub-cl-tasks of car with models trained in step 4, and retrival top10 imgs.
Inference of sub-cl-tasks of people with models trained in step 5, and retrival top10 imgs.
Merge results from step 8 and step 9 to make submission.json.

Pretrained Models

EVA02_CLIP_L_336_psz14_s6B(visual) and EVA02_CLIP_L_psz14_s4B(visual) from EVA
eva02_large_patch14_448.mim_m38m_ft_in22k_in1k from timm
ConvNext-XXLarge-soup from open_clip_torch

Training

Car Classification Training:

!CUDA_VISIBEL_DEVICES=0 \
python -m torch.distributed.launch --nproc_per_node=1 \
CVPR/train_car_cl.py \
--csv-dir data/train_val_cars_*{task}*_10fold.csv \             task in [type, color, brand]
--config-name *{cfg}* \                      cfg in [config_eva_vit_car_cl, config_eva_02_car_cl, config_conv_car_cl]
--image-size *{size}* \                      size in [224 (eva-l), 280 (conv), 336 (eva-l-336), 448 (eva02-448)]
--epochs 10 \
--init-lr 3e-5 \
--batch-size 32 \
--num-workers 8 \
--nbatch_log 300 \
--warmup_epochs 0 \
--fold 1

Pedestrian Classification Training:

!CUDA_VISIBEL_DEVICES=0 \
python -m torch.distributed.launch --nproc_per_node=1 \
CVPR/train_people_cl.py \
--csv-dir data/train_val_peoples_code_fold_*{task}*.csv \          task in [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
--config-name *{cfg}* \                   cfg in [config_eva_vit_car_cl, config_eva_02_car_cl, config_conv_car_cl]
--image-size *{size}* \                   size in [224 (eva-l), 280 (conv), 336 (eva-l-336), 448 (eva02-448)]
--epochs 16 \
--init-lr 3e-5 \
--batch-size 32 \
--num-workers 8 \
--nbatch_log 300 \
--warmup_epochs 1 \
--fold 1

Pedestrian-Car Classification Training:

!CUDA_VISIBEL_DEVICES=0 \
python -m torch.distributed.launch --nproc_per_node=1 \
CVPR/train_people_car_cl.py \
--csv-dir data/train_val_cl_20fold.csv \
--config-name 'config_eva_vit_people_car_cl' \
--image-size 224 \
--epochs 11 \
--init-lr 3e-5 \
--batch-size 32 \
--num-workers 8 \
--nbatch_log 300 \
--warmup_epochs 0 \
--fold 1

Inference

Car Classification Inference(Emsenble):

!python CVPR/inference_car_emsenble.py \
--csv-dir data/test/test_car.csv \
--config_names config_eva_02_car_cl config_conv_car_cl \
--image-sizes 448 320 \
--model-weights 0.8 0.2 \
--model_paths1 output/car/eva_02-car-type/eva_02-448_best_ep5.pth output/car/conv-car-type/convnext_xxlarge_best_ep5.pth \
--model_paths2 output/car/eva_02-car-color/eva_02-448_best_ep5.pth output/car/conv-car-color/convnext_xxlarge_best_ep5.pth \
--model_paths3 output/car/eva_02-car-brand/eva_02-448_best_ep5.pth output/car/conv-car-brand/convnext_xxlarge_best_ep5.pth \
--batch-size 32 \
--num-workers 8 \

Pedestrian Classification Inference(Emsenble):

!python CVPR/inference_people_cl_emsenble.py \
--csv-dir data/test/test_people.csv \
--config_names config_eva_vit_people_cl config_eva_02_people_cl \
--image-sizes 224 448 \
--model-weights 0.5 0.5 \
--batch-size 32 \
--model_path1 \
    output/people/eva-l-people-0/eva-cl-l_best_ep8.pth \
    output/people/eva-l-people-1/eva-cl-l_best_ep8.pth \
    output/people/eva-l-people-2/eva-cl-l_best_ep8.pth \
    output/people/eva-l-people-3/eva-cl-l_best_ep8.pth \
    output/people/eva-l-people-4/eva-cl-l_best_ep8.pth \
    output/people/eva-l-people-5/eva-cl-l_best_ep8.pth \
    output/people/eva-l-people-6/eva-cl-l_best_ep8.pth \
    output/people/eva-l-people-7/eva-cl-l_best_ep8.pth \
    output/people/eva-l-people-8/eva-cl-l_best_ep8.pth \
    output/people/eva-l-people-9/eva-cl-l_best_ep8.pth \
    output/people/eva-l-people-10/eva-cl-l_best_ep8.pth \
    output/people/eva-l-people-11/eva-cl-l_best_ep8.pth \
--model_path2 \
    output/people/eva02-l-448-people-0/eva_02-448_best_ep8.pth \
    output/people/eva02-l-448-people-1/eva_02-448_best_ep8.pth \
    output/people/eva02-l-448-people-2/eva_02-448_best_ep8.pth \
    output/people/eva02-l-448-people-3/eva_02-448_best_ep8.pth \
    output/people/eva02-l-448-people-4/eva_02-448_best_ep8.pth \
    output/people/eva02-l-448-people-5/eva_02-448_best_ep8.pth \
    output/people/eva02-l-448-people-6/eva_02-448_best_ep8.pth \
    output/people/eva02-l-448-people-7/eva_02-448_best_ep8.pth \
    output/people/eva02-l-448-people-8/eva_02-448_best_ep8.pth \
    output/people/eva02-l-448-people-9/eva_02-448_best_ep8.pth \
    output/people/eva02-l-448-people-10/eva_02-448_best_ep8.pth \
    output/people/eva02-l-448-people-11/eva_02-448_best_ep8.pth \
--num-workers 8 \
--nbatch_log 300 \

Pedestrian-Car General Classification Inference:

!python CVPR/inference_people_car_cl.py \
--test_data_path data/test/test_images \
--config-name config_eva_vit_people_car_cl \
--image-size 224 \
--batch-size 32 \
--model_path output/cl/eva-l/eva-cl-l_best_ep2.pth \
--num-workers 8 \
--nbatch_log 300 \

Contact

Email: 3579628328@qq.com

rainbow-xiao/CVPR2023-FMC-workshop

CVPR 2023 1st foundation model challenge-Track2

Leaderboard A: 3rd Place Solution

HARDWARE & SOFTWARE

Arch

Pipeline

Pretrained Models

Training

Inference

Contact