This is the source code of the project "Salient Object Detection for Korean Name Card" of the course "Computer Vision" Summer 2023.
Currently, existing image segmentation tasks mainly focus on segmenting objects with specific characteristics, e.g., salient, camouflaged, meticulous, or specific categories. Most of them have the same input/output formats, and barely use exclusive mechanisms designed for segmenting targets in their models, which means almost all tasks are dataset-dependent. Thus, it is very promising to formulate a category-agnostic DIS task for accurately segmenting objects with different structure complexities, regardless of their characteristics. Compared with semantic segmentation, the proposed DIS task usually focuses on images with single or a few targets, from which getting richer accurate details of each target is more feasible.
In this project, we will investigate the powerful of salient object detection in the real world by experimenting it over a various methods to see whether and how it works with Korean Name Card dataset.
cvps23/
├── configs/ - training config
| ├── README.md - config name style
│ ├── */README.md - abstract and experiment results model
| ├── api/ - wandb api key for monitoring
|
├── tools/ - script to downloading data, training, testing, inference and web interface
|
├── trainer/ - trainer classes
|
├── model/
| ├── architecture/ - model architectures
| ├── README.md - losses and metrics definition
|
├── base/ - abstract base classes
│
├── data/ - storing input data
|
├── data_loader/ - custom dataset and dataloader
│
├── saved/ - trained models config, log-dir and logging output
│
├── logger/ - module for wandb visualization and logging
|
├── utils/ - utility functions
Salient Object Detection | ||||
U2Net (PR'2020) | DIS (ECCV'2022) |
Install the required packages:
pip install -r utils/requirements.txt
Running private repository on Kaggle:
- Generate your token
- Get repo address from
github.com/.../...git
:
git clone https://your_personal_token@your_repo_address.git
cd CVP
Config files are in YAML format
name: U2NetFull_scratch_1gpu-bs4_KNC_size320x320
n_gpu: 1
arch:
type: u2net_full
args: {}
data_loader:
type: KNC_DataLoader
args:
batch_size: 4
shuffle: true
num_workers: 1
validation_split: 0.1
output_size: 320
crop_size: 288
optimizer:
type: Adam
args:
lr: 0.001
weight_decay: 0
eps: 1.e-8
betas:
- 0.9
- 0.999
loss: multi_bce_fusion
metrics:
- mae
- sm
lr_scheduler:
type: StepLR
args:
step_size: 50
gamma: 0.1
trainer:
type: Trainer
epochs: 1000
save_dir: saved/
save_period: 10
verbosity: 1
visual_tool: wandb
project: cvps23
name: U2NetLite_scratch_1gpu-bs4_KNC_size320x320
# Edit *username for tracking WandB multi-accounts
api_key_file: ./configs/api-key/tuanlda78202
entity: tuanlda78202
test:
save_dir: saved/generated
n_sample: 1000
batch_size: 32
Modify the configurations in .yaml
config files, then run:
python scripts/train_dis.py [CONFIG] [RESUME] [DEVICE] [BATCH_SIZE] [EPOCHS]
You can resume from a previously saved checkpoint by:
python scripts/train.py --resume path/to/the/ckpt
python scripts/test.py
-
Running demo on notebook
inference.ipynb
in -
Web Interface: Integrated into Huggingface Spaces 🤗 using Gradio. Try out the Web Demo