kliu128/figuring-out-figures
Multimodal image + text captioning for 416k figures from arXiv. Uses CLIP + SciBERT + GPT-2 in an encoder-decoder architecture. CS224N final project.
Jupyter Notebook
Multimodal image + text captioning for 416k figures from arXiv. Uses CLIP + SciBERT + GPT-2 in an encoder-decoder architecture. CS224N final project.
Jupyter Notebook