/situation-recognition

Situation recognition with Graph Neural Network

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

About

This repository contains python3 scripts for situation recognition in images with Graph Neural Network. Code is adapted from thilinicooray/context-aware-reasoning-for-sr

Features

  • train GNN model
  • analize subset
  • analize single image not in dataset

Requirements

  • PyTorch 1.6+

Check PyTorch website for more info.

Get Started

  • Download imSitu dataset and extract in this repository.
  • Train the model from scratch or download pretrained one from here and put in saving folder (default 'checkpoints' in this repo).
  • Use it!

Usage

$ python3 -u sr.py --resume_model="resnet152_sr" --test_img="giving_267.png"
train set stats: 
         verb count: 504 
         role count: 190
         label count: 2001
         max role count: 6
Resume training from: resnet152_sr
No ground truth verb found, calculating by myself...
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
Analizing:  giving_267.png

image

&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
action (95.17%): paying
good (75.01%): -
place (79.91%): -
agent (62.36%): person
seller (79.63%): person

or

$ python3 -u sr.py --resume_model="resnet152_sr" --subset 2
Loading encoder file
Resume training from: resnet152_sr
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
Analizing:  shearing_226.jpg
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=256x256 at 0x2290D2B5BE0>
action (99.31%): shearing
item (99.98%): wool
place (98.81%): outdoors
agent (98.85%): man
source (99.63%): sheep
---- Ground truth ----
action: shearing
item = [wool, wool, wool]
place = [platform, outdoors, outdoors]
agent = [man, person, person]
source = [sheep, sheep, sheep]
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
Analizing:  celebrating_65.jpg
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=256x256 at 0x2290D2B5C40>
action (27.40%): congregating
individuals (91.47%): people
place (97.52%): outdoors
---- Ground truth ----
action: celebrating
occasion = [-, -, -]
place = [plaza, -, outdoors]
agent = [people, people, people]

or

$ python -u sr.py --imgset_dir='resized_256' --dataset_folder='imSitu' --model_saving_name='resnet152_sr' --batch_size 6144
Loading encoder file
Using 4 GPUs!
Model training started!
Epoch-0, lr: 0.0020
training losses = [v: 6.27, n: 18.01, gt: 18.15]
1-verb: 0.33, 1-value: 34.73, 1-value-all: 6.36
5-verb: 1.67, 5-value: 73.02, 5-value-all: 17.32
gt-value: 33.24, gt-value-all: 6.29, mean = 21.62
--------------------------------------------------
val losses = [v: 6.20, n: 15.93, gt: 16.03]
1-verb: 0.55, 1-value: 51.82, 1-value-all: 13.15
5-verb: 2.48, 5-value: 88.33, 5-value-all: 26.71
gt-value: 49.19, gt-value-all: 10.93, mean = 30.40

or

python -u sr.py --imgset_dir='resized_256' --dataset_folder='imSitu' --resume_model='resnet152_sr' --batch_size 6144
Loading encoder file
Using 4 GPUs!
Resume training from: resnet152_sr
Model training started!
Epoch-30, lr: 0.0020
training losses = [v: 2.27, n: 9.42, gt: 7.95]
1-verb: 44.96, 1-value: 79.22, 1-value-all: 48.33
5-verb: 73.41, 5-value: 97.80, 5-value-all: 64.74
gt-value: 92.59, gt-value-all: 64.62, mean = 70.71
--------------------------------------------------
val losses = [v: 3.04, n: 10.08, gt: 8.00]
1-verb: 32.37, 1-value: 74.68, 1-value-all: 42.99
5-verb: 59.52, 5-value: 97.36, 5-value-all: 60.70
gt-value: 92.72, gt-value-all: 65.09, mean = 65.68