VQA skills

This repository holds the codes for my master's report completed in May 2019 for my MSIS degree and specialization in Machine Learning from UT-Austin. The dataset used in this project will be published with the paper.

This research contributes to the understanding of the unique information needs and challenges faced by blind users with the goal to improve the status quo of visual assistive technologies.

Image-question pairings from the VizWiz datasets pose significant challenges to existing machine learning algorithms with inconsistent image quality and colloquialism. Visual question answering (VQA) tasks that involves crowdsourcing and community answering can also be better divided and assigned to the appropriate crowd workers based on their experiences, preferences and skills. An algorithm to identify the skills involved can be potentially transferred to another task utilizing machine and human computations to better assist visually impaired users.

VQA Evaluation

The original VQA evaluation repo is available at https://github.com/GT-Vision-Lab/VQA

Requirements

python 3.7.2
scikit-image 0.14.0
tensorflow core 1.13
keras 2.2.4
scipy 1.2.0
pandas 0.24.0
cv2

Set up:

pip3 install --upgrade pip
pip3 install -r requirements.txt

edithzeng/VQA_skills

VQA skills

VQA Evaluation

Requirements

Set up: