Pinned Repositories
acapp
django web app
ACoLP
Open Set Video HOI detection from Action-centric Chain-of-Look Prompting, ICCV2023
ai-web-app
Django Artificial Intelligence Web App for Facial Expression Recognition (FER)
ARL
IEEE Transactions on Affective Computing "Facial Action Unit Detection Using Attention and Relation Learning"
Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
atlas
Code repository for supporting the paper "Atlas Few-shot Learning with Retrieval Augmented Language Models",(https//arxiv.org/abs/2208.03299)
attention_branch_network
Attention Branch Network (CIFAR100, ImageNet models)
AU-Net
Towards robust facial action units detection
Augmentation-Adapted-Retriever
[ACL 2023] This is the code repo for our ACL'23 paper "Augmentation-Adapted Retriever Improves Generalization of Language Models as Generic Plug-In".
awesome-visual-question-answering
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
vhzy's Repositories
vhzy/Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
vhzy/cutlass
CUDA Templates for Linear Algebra Subroutines
vhzy/CVPR24Track-LongVideo
vhzy/explore-eqa
Public release for "Explore until Confident: Efficient Exploration for Embodied Question Answering"
vhzy/flash-attention
Fast and memory-efficient exact attention
vhzy/IG-VLM
vhzy/JiuTian-LION
[CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
vhzy/Koala-video-llm
vhzy/LangRepo
Language Repository for Long Video Understanding
vhzy/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning: LLaVA (Large Language-and-Vision Assistant) built towards GPT-4V level capabilities.
vhzy/LongVA
Long Context Transfer from Language to Vision
vhzy/LongVLM
vhzy/MA-LMM
(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
vhzy/MemVP
[ICML 2024] Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning
vhzy/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
vhzy/MiniGPT4-video
Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding
vhzy/MovieChat
[CVPR 2024] 🎬💭 chat with over 10K frames of video!
vhzy/ms-swift
Use PEFT or Full-parameter to finetune 300+ LLMs or 60+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V-2.6, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
vhzy/NExT-GQA
Can I Trust Your Answer? Visually Grounded VideoQA (Accepted to CVPR'24)
vhzy/PLLaVA
Official repository for the paper PLLaVA
vhzy/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
vhzy/Sealing
[NAACL 2024] Official Implementation of paper "Self-Adaptive Sampling for Efficient Video Question Answering on Image--Text Models"
vhzy/self-rag
This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.
vhzy/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
vhzy/Video-ChatGPT
"Video-ChatGPT" is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
vhzy/Video-LLaVA
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
vhzy/Video-STaR
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
vhzy/VideoAgent
This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding
vhzy/VideoTree
Code for paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"
vhzy/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs