Pinned Repositories
long-context-attention
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
DockerTarBuilder
它是一个工作流。可快速构建指定架构/平台的docker镜像
q_former
vicana-13b-pth
LLaVA-NeXT
Video-ChatGPT
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
VILA
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
Qwen2-VL
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
MiniGPT4-video
Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding
VITA
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
hb-jw's Repositories
hb-jw/DockerTarBuilder
它是一个工作流。可快速构建指定架构/平台的docker镜像
hb-jw/vicana-13b-pth
hb-jw/q_former