Pinned Repositories
ADE-dataset
A first dataset to determine the alignment relation between visual and textual elements on diagrams for diagram understanding. We will open the dataset and code after the paper is acceped.
M6Doc
InternLM-XComposer
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
TabRecSet
A large scale camera-taken table detection and recognition dataset.
swift
ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 50+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V-2.6, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
CuMo
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
surya
OCR, layout analysis, reading order, table recognition in 90+ languages
vqaonline.github.io
PlantCLEF2022
Marcovaldon's Repositories
Marcovaldon doesn’t have any repository yet.