Pinned Repositories
Compressed-Video-Reader
A video reader for extracting motion vectors and residuals from encoded H.264 videos.
alpaca-lora
Instruct-tune LLaMA on consumer hardware
ChatBridge
ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without relying on all combinations of paired data.
ChatSearch
ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval
LLaVA-NeXT-kv
VideoNIAH
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
Kangaroo
official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input
DyCoke
DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
LAMM
[NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents
joez17's Repositories
joez17/ChatBridge
ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without relying on all combinations of paired data.
joez17/VideoNIAH
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
joez17/ChatSearch
ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval
joez17/alpaca-lora
Instruct-tune LLaMA on consumer hardware
joez17/LLaVA-NeXT-kv