Pinned Repositories
Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
Awesome-FGVC
Awesome-Fine-grained-Visual-Classification
Awesome Fine-grained Visual Classification
CER_Task
Chat-UniVi
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
email_processing
NLP_1
hammer
imp
a family of multimodal small language models
Koala-video-llm
MovieChat
[CVPR 2024] 🎬💭 chat with over 10K frames of video!
Emiya-syw's Repositories
Emiya-syw/MovieChat
[CVPR 2024] 🎬💭 chat with over 10K frames of video!
Emiya-syw/Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
Emiya-syw/Awesome-FGVC
Emiya-syw/Awesome-Fine-grained-Visual-Classification
Awesome Fine-grained Visual Classification
Emiya-syw/CER_Task
Emiya-syw/Chat-UniVi
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Emiya-syw/email_processing
NLP_1
Emiya-syw/hammer
Emiya-syw/imp
a family of multimodal small language models
Emiya-syw/Koala-video-llm
Emiya-syw/llama
Inference code for LLaMA models
Emiya-syw/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Emiya-syw/nice_24_task_2
Emiya-syw/PaperReading
Paper Reading of IMCC groups.
Emiya-syw/SGTR
Emiya-syw/T-SciQ
Emiya-syw/TinyLLaVABench
A Framework of Small-scale Large Multimodal Models
Emiya-syw/TransFG
This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).
Emiya-syw/TTC-faster-rcnn
Emiya-syw/Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Emiya-syw/VideoLLaMA2
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs