Marcovaldon

Pinned Repositories

ADE-dataset
A first dataset to determine the alignment relation between visual and textual elements on diagrams for diagram understanding. We will open the dataset and code after the paper is acceped.
0 1 20
InternLM-XComposer
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Language:Python2.6k 44 395159
TabRecSet
A large scale camera-taken table detection and recognition dataset.
Language:Python113 4 138
ms-swift
Use PEFT or Full-parameter to finetune 400+ LLMs (Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, ...) or 100+ MLLMs (Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, GOT-OCR2, ...).
Language:Python4.7k 24 1.4k407
CuMo
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
Language:Python136 2 1310
surya
OCR, layout analysis, reading order, table recognition in 90+ languages
Language:Python14.6k 108 170924
vqaonline.github.io
Language:JavaScript0 1 10

Marcovaldon doesn’t have any repository yet.