HotpotNet for TextVQA Task

This is a project repo for Team Hotpot at CMU LTI's 2022 Spring course 11-777 Multimodal Machine Learning.

Our contribution is Hotpot Net, a model that takes feature information from multiple modalities to tackle the challenge of visual question answering that requires reading textual information on the question image.

File Structure

Reports: contains 1 final report and 3 intermediate reports that summarize progress of research and analysis throughout the semester.
Code: please refer to https://github.com/Willyoung2017/mmf_textvqa for our implementation and experiments of baselines and our proposed model Hotpot Net.
Data: stores data downloaded from the official webpage of TextVQA challenges (https://textvqa.org)
Data_Analysis: contains codes for exploratory analysis on Data
Modal_Analysis: contains code and results of quantitative analysis on model output.

Willyoung2017/HotpotNet-for-TextVQA

HotpotNet for TextVQA Task

File Structure