yepzhang

Pinned Repositories

tarsier
Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with good capability of general video understanding.
Language:Python322 8 2319
ms-swift
Use PEFT or Full-parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, GOT-OCR2, ...).
Language:Python6.4k 33 1.9k547
InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Language:Python1.7k 27 239105
CogVLM2
GPT4V-level open-source multi-modal model based on Llama3-8B
Language:Python2.3k 29 182152
vlm-rlaif
ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback
Language:Python63 3 43
LongVLM
Language:Python92 4 107

yepzhang doesn’t have any repository yet.