xiaomei1995

xiaomei1995's Stars

PaddlePaddle/PaddleOCR
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Language:Python42.6k 437 9.2k7.7k
chatchat-space/Langchain-Chatchat
Langchain-Chatchat（原Langchain-ChatGLM）基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Llama) RAG and Agent app with langchain
Language:TypeScript31.2k 282 3.8k5.4k
KurtBestor/Hitomi-Downloader
:cake: Desktop utility to download images/videos/music/text from various websites, and more.
Language:Python21.7k 238 6.5k2k
ocrmypdf/OCRmyPDF
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Language:Python13.6k 136 1.2k995
facebookresearch/nougat
Implementation of Nougat Neural Optical Understanding for Academic Documents
Language:Python8.7k 63 208560
jsvine/pdfplumber
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
Language:Python6.3k 91 549653
pengxiao-song/LaWGPT
🎉 Repo for LaWGPT, Chinese-Llama tuned with Chinese Legal knowledge. 基于中文法律知识的大语言模型
Language:Python5.8k 49 123530
gaotianliuyun/gao
FongMi影视和tvbox配置文件，如果喜欢，请Fork自用。使用前请仔细阅读仓库说明，一旦使用将被视为你已了解。
Language:JavaScript5.4k 77 902.2k
649453932/Chinese-Text-Classification-Pytorch
中文文本分类，TextCNN，TextRNN，FastText，TextRCNN，BiLSTM_Attention，DPCNN，Transformer，基于pytorch，开箱即用。
Language:Python5.3k 35 1171.2k
Layout-Parser/layout-parser
A Unified Toolkit for Deep Learning Based Document Image Analysis
Language:Python4.8k 72 148459
QwenLM/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Language:Python4.8k 49 430360
OpenBMB/ToolBench
[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
Language:Python4.7k 49 285401
baichuan-inc/Baichuan2
A series of large language models developed by Baichuan Intelligent Technology
Language:Python4.1k 40 394294
indiff/qttabbar
QTTabBar is a small tool that allows you to use tab multi label function in Windows Explorer. https://www.yuque.com/indiff/qttabbar
Language:C#3.7k 45 368266
kermitt2/grobid
A machine learning software for extracting information from scholarly documents
Language:Java3.4k 96 860444
breezedeus/Pix2Text
An Open-Source Python3 tool for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported.
Language:Jupyter Notebook1.8k 16 77175
EleutherAI/the-pile
Language:Python1.5k 31 100127
alibaba/data-juicer
A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据！
Language:Python1.3k 12 11878
doc-analysis/TableBank
TableBank: A Benchmark Dataset for Table Detection and Recognition
1k 37 44141
AILab-CVC/UniRepLKNet
[CVPR'24] UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
Language:Python894 12 1953
mammothb/symspellpy
Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
Language:Python791 16 91116
codefuse-ai/MFTCoder
High Accuracy and efficiency multi-task fine-tuning framework for Code LLMs. This work has been accepted by KDD 2024.
Language:Python615 8 4665
moon-hotel/BertWithPretrained
An implementation of the BERT model and its related downstream tasks based on the PyTorch framework
Language:Python549 5 21107
SupritYoung/RLHF-Label-Tool
用于大模型 RLHF 进行人工数据标注排序的工具。A tool for manual response data annotation sorting in RLHF stage.
Language:Python240 5 217
RapidAI/RapidStructure
版面分析 | 表格识别 | 文档方向分类
Language:Python182 6 1614
phamquiluan/PubLayNet
ICDAR 2019: MaskRCNN on PubLayNet datasets. Paragraph detection, table detection, figure detection,...
Language:Python176 5 1539
rockyzhengwu/document-ocr
一个相对完整的文档分析和识别项目
Language:Python143 10 538
luckydog5/TabelDetection
Using deep-leaning detect tables in the documet image
Language:Python2518
JovenChu/vector_stores_test
基于milvus和faiss实现文本转向量并存储的流程及简单性能测试
Language:Python72
RebaekSoparkKitchen/duidui
中文校对：自动识别并修正段落 / 标点符号 / 空格；广告禁用词检索，大段重复检测。
Language:TypeScript7 1 11