MIL-VLG

Vision-and-Language Group, Media Intelligence Lab

Hangzhou Dianzi UniversityHangzhou, China

Pinned Repositories

mlc-imp
Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
Language:Python0 0 00
activitynet-qa
An VideoQA dataset based on the videos from ActivityNet
Language:Python67 3 69
bottom-up-attention.pytorch
A PyTorch reimplementation of bottom-up-attention models
Language:Jupyter Notebook292 2 9575
imp
a family of highly capabale yet efficient large multimodal models
Language:Python159 5 716
mcan-vqa
Deep Modular Co-Attention Networks for Visual Question Answering
Language:Python442 6 3888
mmnas
Deep Multimodal Neural Architecture Search
Language:Python26 1 118
openvqa
A lightweight, scalable, and general framework for visual question answering research
Language:Python320 12 2964
prophet
Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".
Language:Python264 3 4027
rosita
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Language:Python56 0 813
xmchat
30 1 52

MIL-VLG/mlc-imp
Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
Language:Python0 0 00