chiutaiyin/Vision-Skills

Jupyter Notebook

Vision-Skills

Code for the Vision Skills Needed to Answer Visual uestions

Requirements

tensorflow v1.14
keras v2.3.1 (for the pre-trained model ResNet152)

Files

./

sample_code.ipynb: demo of skill prediction using Resnet152 feature maps.
build_vocab.ipynb: demo of how to create vocabulary for your own task.

./csv

vizwiz_skill_typ_{test/train/val}.csv: skill annotations for the VizWiz dataset.
vqa_skill_typ_{test/train/val}.csv: skill annotations for the selected images from the VQA2.0 dataset.

./utils

VqaQualityModel.py: model for vision skill prediction.
word2vocab_{vizwiz/vqa}: Ids of tokenized frequent words in the questions in the VizWiz/VQA dataset.

./ckpt/{vizwiz/vqa}/{cnt/col/txt}: download here

checkpoints for the prediction of the counting/color recognition/text recognition skill for the VizWiz/VQA dataset, respectively.

References