zhangjx123's Stars
abi/screenshot-to-code
Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
pyecharts/pyecharts
🎨 Python Echarts Plotting Library
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
lukas-blecher/LaTeX-OCR
pix2tex: Using a ViT to convert images of equations into LaTeX code.
facebookresearch/nougat
Implementation of Nougat Neural Optical Understanding for Academic Documents
LargeWorldModel/LWM
Large World Model -- Modeling Text and Video with Millions Context
mit-han-lab/efficientvit
Efficient vision foundation models for high-resolution generation and perception.
MhLiao/DB
A PyTorch implementation of "Real-time Scene Text Detection with Differentiable Binarization".
Gsllchb/Handright
A lightweight Python library for simulating Chinese handwriting
X-PLUG/mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Yuliang-Liu/Monkey
【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models
pyecharts/pyecharts-gallery
Just use pyecharts to imitate Echarts official example.
KangLiao929/Awesome-Deep-Camera-Calibration
Deep Learning for Camera Calibration and Beyond: A Survey
hsfzxjy/handwriter.ttf
Handwriting synthesis with Harfbuzz WASM.
fh2019ustc/Awesome-Document-Image-Rectification
A comprehensive list of awesome document image rectification papers.
alvinwan/TexSoup
fault-tolerant Python3 package for searching, navigating, and modifying LaTeX documents
ymy-k/Hi-SAM
[TPAMI'24] Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation
xinke-wang/Awesome-Text-VQA
UniModal4Reasoning/DocGenome
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models
pdf-association/pdf-corpora
An index of PDF-centric corpora
Kamino666/watermark-tracer
一个基于可视水印检测识别的数字媒体溯源应用系统,是我的大作业项目,包含这个系统以及一个开源的大规模常见水印图像数据集(Large-scale Common Watermark Dataset, LCWD)。 输入一个带有可视水印的图片或视频,系统会检测定位到水印所在的区域,然后将其提取出来,然后借助百度AI开放平台的OCR和logo识别以及Bing搜索引擎,溯源到这个图片或视频的源头。
mxin262/ESTextSpotter
(ICCV 2023) ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer
mxin262/Bridging-Text-Spotting
(CVPR 2024) Bridging the Gap Between End-to-End and Two-Step Text Spotting.
bytedance/E2STR
The official code for the CVPR 2024 paper: Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
shannanyinxiang/UPOCR
Official implementation of UPOCR: Towards unified pixel-level OCR interface (ICML 2024)
Yuliang-Liu/Open-Oracle
AI-assisted Deciphering Oracle Bone Script
bzluan/TextCoT
The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.
ZZZHANG-jx/Awesome-Document-Image-Rectification
A comprehensive list of awesome document image rectification papers.
mxin262/Monkey
【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models