fourierer's Stars
facebookresearch/detectron2
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
junyanz/pytorch-CycleGAN-and-pix2pix
Image-to-Image Translation in PyTorch
lukas-blecher/LaTeX-OCR
pix2tex: Using a ViT to convert images of equations into LaTeX code.
OpenBMB/MiniCPM
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
idealo/imagededup
😎 Finding duplicate images made easy!
QwenLM/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
clovaai/deep-text-recognition-benchmark
Text recognition (optical character recognition) with deep learning methods, ICCV 2019
aim-uofa/AdelaiDet
AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.
modelscope/swift
ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 50+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V-2.6, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
meijieru/crnn.pytorch
Convolutional recurrent network in pytorch
microsoft/table-transformer
Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.
THUDM/CogVLM2
GPT4V-level open-source multi-modal model based on Llama3-8B
MhLiao/DB
A PyTorch implementation of "Real-time Scene Text Detection with Differentiable Binarization".
kingyiusuen/image-to-latex
Convert images of LaTex math equations into LaTex code.
X-PLUG/mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Sanster/text_renderer
Generate text images for training deep learning ocr model
oh-my-ocr/text_renderer
chineseocr/trocr-chinese
transformers ocr for chinese
TianzhongSong/awesome-SynthText
A curated list of awesome synthetic data for text location and recognition
ViTAE-Transformer/DeepSolo
The official repo for [CVPR'23] "DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting" & [ArXiv'23] "DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting"
wenwenyu/TCM
Turning a CLIP Model into a Scene Text Detector (CVPR2023) | Turning a CLIP Model into a Scene Text Spotter (TPAMI)
mlpc-ucsd/TESTR
(CVPR 2022) Text Spotting Transformers
ymy-k/DPText-DETR
[AAAI'23 Oral] DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer
mxin262/ESTextSpotter
(ICCV 2023) ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer
j-river/svtr-pytorch
pytorch version of svtr model
vincezengqiang/caffe_ocr
主流ocr算法研究实验性的项目,目前实现了CNN+BLSTM+CTC架构
vincezengqiang/text_renderer
Generate text images for training deep learning ocr model
vincezengqiang/trocr-chinese
transformers ocr for chinese