may0324

tencentbeijing

may0324's Stars

wangwen-whu/WTW-Dataset
This is an official implementation for the WTW Dataset in "Parsing Table Structures in the Wild " on table detection and table structure recognition.
Language:Python16515
harrytea/Awesome-Document-Understanding
Document Artifical Intelligence
1325
cv-small-snails/Awesome-Table-Recognition
A curated list of resources dedicated to table recognition
37551
ucaslcl/Fox
official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"
Language:Python1317
llavar/MMR_Bench
Language:Python41
Ucas-HaoranWei/GOT-OCR2.0
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Language:Python6.2k534
snap-research/Panda-70M
[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Language:Python52919
Sanster/xy-cut
Language:Python7816
Yuliang-Liu/MultimodalOCR
On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
Language:Python47532
modelscope/ms-swift
Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
Language:Python4.5k391
X-PLUG/mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Language:Python1.6k101
X-PLUG/mPLUG-Owl
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
Language:Python2.3k177
Breakthrough/PySceneDetect
:movie_camera: Python and OpenCV-based scene cut/transition detection program & library.
Language:Python3.3k402
mst272/LLM-Dojo
欢迎来到 LLM-Dojo，这里是一个开源大模型学习场所，使用简洁且易阅读的代码构建模型训练框架(支持各种主流模型如Qwen、Llama、GLM等等)、RLHF框架(DPO/CPO/KTO/PPO)等各种功能。👩‍🎓👨‍🎓
Language:Python36332
meta-llama/llama
Inference code for Llama models
Language:Python56.6k9.6k
bitsandbytes-foundation/bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
Language:Python6.4k637
Vision-CAIR/MiniGPT4-video
Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding
Language:Python56161
Vision-CAIR/MiniGPT-4
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
Language:Python25.5k2.9k
IDEA-Research/Grounded-Segment-Anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
Language:Jupyter Notebook15.3k1.4k
IDEA-Research/GroundingDINO
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Language:Python6.9k699
showlab/VLog
Transform Video as a Document with ChatGPT, CLIP, BLIP2, GRIT, Whisper, LangChain.
Language:Python54526
showlab/UniVTG
[ICCV2023] UniVTG: Towards Unified Video-Language Temporal Grounding
Language:Python32429
Albertsr/Anomaly-Detection
UnSupervised and Semi-Supervise Anomaly Detection / IsolationForest / KernelPCA Detection / ADOA / etc.
Language:Python29189
comfyanonymous/ComfyUI
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
Language:Python58.7k6.2k
weijiawu/BOVText-Benchmark
[NeurIPS2021] BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting
Language:Python676
weijiawu/TransVTSpotter
A new video text spotting framework with Transformer
Language:Python7711
LukeForeverYoung/UReader
Language:Python1269
TencentARC/VTLayout
3
yuxie11/R2D2
Language:Python15723
salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
Language:Jupyter Notebook10k975

may0324

may0324's Stars

wangwen-whu/WTW-Dataset

harrytea/Awesome-Document-Understanding

cv-small-snails/Awesome-Table-Recognition

ucaslcl/Fox

llavar/MMR_Bench

Ucas-HaoranWei/GOT-OCR2.0

snap-research/Panda-70M

Sanster/xy-cut

Yuliang-Liu/MultimodalOCR

modelscope/ms-swift

X-PLUG/mPLUG-DocOwl

X-PLUG/mPLUG-Owl

Breakthrough/PySceneDetect

mst272/LLM-Dojo

meta-llama/llama

bitsandbytes-foundation/bitsandbytes

Vision-CAIR/MiniGPT4-video

Vision-CAIR/MiniGPT-4

IDEA-Research/Grounded-Segment-Anything

IDEA-Research/GroundingDINO

showlab/VLog

showlab/UniVTG

Albertsr/Anomaly-Detection

comfyanonymous/ComfyUI

weijiawu/BOVText-Benchmark

weijiawu/TransVTSpotter

LukeForeverYoung/UReader

TencentARC/VTLayout

yuxie11/R2D2

salesforce/LAVIS