lyhh123's Stars
UniModal4Reasoning/DocGenome
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models
Ucas-HaoranWei/GOT-OCR2.0
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
dailenson/One-DM
Official Code for ECCV 2024 paper — One-Shot Diffusion Mimicker for Handwritten Text Generation
QwenLM/Qwen2-VL
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Topdu/OpenOCR
hiroi-sora/GapTree_Sort_Algorithm
【间隙·树·排序算法】 对OCR结果或PDF提取的文本进行版面分析,按人类阅读顺序进行排序。
CVCUDA/CV-CUDA
CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.
NaiboWang/EasySpider
A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。
lyhh123/MTF-110K
A Comprehensive Dataset for Mixed Text and Formula Recognition in Educational and Scientific Documents
dailenson/SDT
This repository is the official implementation of Disentangling Writer and Character Styles for Handwriting Generation (CVPR 2023)
lllyasviel/IC-Light
More relighting!
whai362/PVT
Official implementation of PVT series
PaddlePaddle/PaddleOCR
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
binary-husky/gpt_academic
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, moss等。
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
AlibabaResearch/AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
LiuHC0428/LAW-GPT
中文法律对话语言模型
Ucas-HaoranWei/Vary-toy
Official code implementation of Vary-toy (Small Language Model Meets with Reinforced Vision Vocabulary)
facebookresearch/nougat
Implementation of Nougat Neural Optical Understanding for Academic Documents
Yuliang-Liu/MultimodalOCR
On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
meta-llama/llama3
The official Meta Llama 3 GitHub site
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
Yuliang-Liu/Monkey
【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models
Pythagora-io/gpt-pilot
The first real AI developer
chineseocr/trocr-chinese
transformers ocr for chinese
mamba-org/mamba
The Fast Cross-Platform Package Manager
dair-ai/ml-visuals
🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.
thuml/iTransformer
Official implementation for "iTransformer: Inverted Transformers Are Effective for Time Series Forecasting" (ICLR 2024 Spotlight), https://openreview.net/forum?id=JePfAI8fah
megvii-research/NAFNet
The state-of-the-art image restoration model without nonlinear activation functions.
HCIILAB/Scene-Text-Recognition