/cvps23

CVP Summer'23 - SOD

Primary LanguagePython

Salient Object Detection for Korean Name Card

DIS-R

This is the source code of the project "Salient Object Detection for Korean Name Card" of the course "Computer Vision" Summer 2023.


Abstract

Currently, existing image segmentation tasks mainly focus on segmenting objects with specific characteristics, e.g., salient, camouflaged, meticulous, or specific categories. Most of them have the same input/output formats, and barely use exclusive mechanisms designed for segmenting targets in their models, which means almost all tasks are dataset-dependent. Thus, it is very promising to formulate a category-agnostic DIS task for accurately segmenting objects with different structure complexities, regardless of their characteristics. Compared with semantic segmentation, the proposed DIS task usually focuses on images with single or a few targets, from which getting richer accurate details of each target is more feasible.

In this project, we will investigate the powerful of salient object detection in the real world by experimenting it over a various methods to see whether and how it works with Korean Name Card dataset.

Folder Structure

cvps23/
├── configs/ - training config
|   ├── README.md - config name style
│   ├── */README.md - abstract and experiment results model
|   ├── api/ - wandb api key for monitoring
|
├── tools/ - script to downloading data, training, testing, inference and web interface
|
├── trainer/ - trainer classes 
|
├── model/ 
|   ├── architecture/ - model architectures
|   ├── README.md - losses and metrics definition
|
├── base/ - abstract base classes
│   
├── data/ - storing input data
|
├── data_loader/ - custom dataset and dataloader
│
├── saved/ - trained models config, log-dir and logging output
│
├── logger/ - module for wandb visualization and logging
|
├── utils/ - utility functions

Model Zoo

Salient Object Detection
U2Net (PR'2020) DIS (ECCV'2022)

Usage

Install the required packages:

pip install -r utils/requirements.txt

Running private repository on Kaggle:

  1. Generate your token
  2. Get repo address from github.com/.../...git:
git clone https://your_personal_token@your_repo_address.git
cd CVP

Config file format

Config files are in YAML format
name: U2NetFull_scratch_1gpu-bs4_KNC_size320x320

n_gpu: 1

arch:
  type: u2net_full
  args: {}

data_loader:
  type: KNC_DataLoader
  args:
    batch_size: 4
    shuffle: true
    num_workers: 1
    validation_split: 0.1
    output_size: 320
    crop_size: 288

optimizer:
  type: Adam
  args:
    lr: 0.001
    weight_decay: 0
    eps: 1.e-8
    betas:
      - 0.9
      - 0.999

loss: multi_bce_fusion

metrics:
  - mae
  - sm

lr_scheduler:
  type: StepLR
  args:
    step_size: 50
    gamma: 0.1

trainer:
  type: Trainer

  epochs: 1000
  save_dir: saved/
  save_period: 10
  verbosity: 1

  visual_tool: wandb
  project: cvps23
  name: U2NetLite_scratch_1gpu-bs4_KNC_size320x320

  # Edit *username for tracking WandB multi-accounts
  api_key_file: ./configs/api-key/tuanlda78202
  entity: tuanlda78202
  
test:
  save_dir: saved/generated
  n_sample: 1000
  batch_size: 32

Using config files

Modify the configurations in .yaml config files, then run:

python scripts/train_dis.py [CONFIG] [RESUME] [DEVICE] [BATCH_SIZE] [EPOCHS]

Resuming from checkpoints

You can resume from a previously saved checkpoint by:

python scripts/train.py --resume path/to/the/ckpt

Evaluating

python scripts/test.py

Inference

Contributors