Textbook-Question-Answering Model Multi-stage Pretrain for text part Dense Layer of Text-guided Visual Attention for diagram part Experiment Conducted on single Tesla-v100 Results