KangSooHan/GAI

Video Captioning Based on Both Egocentric and Exocentric Views of Robot Vision for Human-Robot Interaction

Python

Video Captioning Based on Both Egocentric and Exocentric Views of Robot Vision for Human-Robot Interaction

Robot vision data can be thought of as first-person videos.

There are three possible situations in one egocentric video.

Global - The global explains the overall situation including detailed information such as place, light,weather
Action - The action explains what the subject, i.e.I, is doing.
Interaction - The interaction explains the interacting situation or behavior between the subject, i.e. me,and others

Global Action Interaction(GAI)

Environment

Python3.6
Tensorflow 1.5.0

How To Run

Download UTEgocentric Dataset Dataset [Preprocess Dataset] (https://drive.google.com/file/d/1IlX_WosLWfqRnIGIobI9gipZ8EGOJIUz/view?usp=sharing)
Extract Video Features
```
$ python extract_RGB_feature.py
```
Train model
```
$ python train.py
```
Test model
```
$ python test.py
```

References

S2VT model by chenxinpeng

Vgg model by AlonDaks/tsa-kaggle

attention by [chenxinpeng] (https://github.com/AdrianHsu/S2VT-seq2seq-video-captioning-attention)

Dataset by [UTEgocentric] (http://vision.cs.utexas.edu/projects/egocentric/storydriven.html)