Using LLMs and pre-trained caption models for super-human performance on image captioning.
Primary LanguagePythonOtherNOASSERTION